中文 | English
A multi-agent self-managing project framework. Install once, and a team of Claude agents takes over your project: planning, coding, reviewing, and adjusting course autonomously — until convergence or token exhaustion.
┌─────────────────────────────────────────┐
│ SUPERVISOR (Claude, "the brain") │ L3-L6 meta-decisions
│ - patrol / write reports │ + planning duties
│ - decide escalation paths │ (calls tm-plan-cycle
│ - edit goal.md / promote workspace │ directly when queue
│ - decide when to shutdown │ runs low)
└────────────────────┬────────────────────┘
│ note / add_task / shutdown
▼
┌──────────────────────┐ ┌────────────────────┐
│ pm-daemon.py │────────▶│ EXECUTOR × N │
│ Python (dispatcher) │ │ Claude (work) │
│ ✓ deterministic │ │ edits workspace/ │
│ ✓ zero-token │ │ tm-done reports │
│ ✓ FOREVER mode │ │ │
└──────────────────────┘ └────────────────────┘
▲
│ events/*.json (file mailbox)
│
all roles decoupled
Default is 3-agent (supervisor + N executors). Pass --with-planner to spawn a separate PLANNER Claude session — useful when the supervisor's combined load gets too dense, or you want planner / supervisor decisions independently auditable.
Core idea: use LLMs where they earn their keep (creativity, judgment, code authoring); leave the dispatch plumbing to deterministic Python (assigning tasks, tracking state, nagging, recycling crashed workers).
Built-in maturity ladder, all levels in place (L0–L6):
| Level | Capability | How it works |
|---|---|---|
| L0 | Single agent works on tasks | One Claude window |
| L1 | Multiple agents in parallel | PM dispatches to N executors via lock-free file mailbox |
| L2 | Planner generates new tasks continuously | tm-profile writes per-project manifest.json; planner drives off it; 5-min clean cooldown |
| L3 | Supervisor makes meta-decisions | tm-claude-supervisor + status reports + ADR decision logs |
| L4 | Agents harden their own source | Executor edits workspace/ |
| L5 | Workspace fixes auto-flow back to production | tm-promote with stale-snapshot detection |
| L6 | Goal evolves autonomously | tm-supervise revise-goal + version snapshots |
| Innovation channel | Human + LLM both inject vision | vision.md (you write) + tm-vision propose (LLM brainstorm) → supervisor promotes when ready |
Note on L7: you might expect the ladder to continue to L7 (cross-project orchestration). Intentionally omitted. L7 is a different category of problem (multi-tenancy, cross-project state aggregation) requiring a base-layer rewrite. 99% of the "I want to manage multiple projects" use cases are solved by running
tm-init+tm-team-upfor each project independently — that's effectively a simplified L7 with zero extra code. A full L7 belongs in a separate multi-tenant project, not this framework's next version.
This system is good at gap-filling, not invention: the planning loop reads manifest.json, finds undone work, hardens code, fills coverage gaps. It cannot dream up "this should become a SaaS" / "this needs a web UI" / "this should run cross-project" — those are leaps outside the closed loop.
The fix: humans are the source of innovation. Two channels feed novel directions into the system:
| Channel | Source | Stored at | Read by |
|---|---|---|---|
vision.md |
You (human) write directly | repo root | supervisor each cycle (and planner each cycle if --with-planner) |
proposals/ |
LLM brainstorm via tm-vision propose |
proposals/<iso>-<slug>.md |
supervisor decides promote / defer / reject |
Vision items do not enter the task queue directly. The supervisor is a gating layer — when something looks ripe, it's promoted into goal.md via tm-supervise revise-goal, after which the planning loop naturally generates tasks toward it. This gating prevents LLM-generated "ideas" from hijacking the project's direction.
Roles in detail:
- PM (Python daemon) — dispatches, tracks, GCs, escalates. Zero token cost.
- SUPERVISOR (Claude) — project manager: status reports, decisions, course corrections, shutdown. In default 3-agent mode, also handles planning — calls
tm-plan-cycle adddirectly when the queue runs low. Planning dimensions are derived from the project itself:tm-profilereads workspace/ and writesmanifest.jsonwith 3-7 dimensions tailored to THIS code (candidate pool covers correctness/robustness/perf/security/features/UI/docs/distribution/integrations/data_model, plus domain-specific axes like frame_pacing for games, audio_latency for audio apps, numerical_stability for physics sims). Dimensions that don't apply to this project don't enter the manifest, saving tokens on irrelevant audits. - PLANNER (Claude, opt-in) — separate planner session, enabled with
tm-team-up --with-planner. Identical responsibilities to the supervisor's planning role; use when supervisor's combined load is too dense, or when you want planner / supervisor decisions independently auditable. The 4-agent (supervisor + planner + executors) was the original design; 3-agent is the default after measuring lower steady-state token cost. - EXECUTOR × N (Claude) — the hands. Edits code in workspace/. Multiple executors run in parallel.
The bundled demo points workspace/ at a copy of the framework itself, so the agent team optimizes the very code it's made of.
cd ZeroProgramer
./bin/setup-self-optimize # copies framework into workspace/
./bin/tm-team-up 2 # 1 supervisor + 2 executors, headless; tm-web auto-opens in browser
./bin/tm-web # (already auto-opened, but you can re-open any time)Default mode is headless agents + browser dashboard — no terminal windows pop open. Pass --windows for legacy per-agent terminal windows, --native for the experimental tkinter window, or --with-planner to spawn a 4-agent team.
Each agent auto-submits go and enters its role-specific loop without manual input.
# 1. clone the framework
git clone https://github.com/Hosico02/ZeroProgramer ~/tools/zeroprogramer
# 2. bootstrap into any target directory
~/tools/zeroprogramer/bin/tm-init ~/my-project
# 3. configure project goals
cd ~/my-project
$EDITOR goal.md # describe what 'done' looks like
cp -r ~/source/* workspace/ # populate workspace with code to be managed
# 4. launch
./bin/tm-pm-up # PM + browser, click ▶ 启动 in dashboard
# or: ./bin/tm-team-up 2 # one-shot, agents start immediatelytm-init creates:
bin/— the toolkit.claude/settings.json— hooks + permissionsCLAUDE.md— executor instructionsgoal.mdandvision.mdtemplatesworkspace/— the only directory executors can edit.gitignore— runtime state stays out of git
The agent team is a gap-filler — it audits workspace/ for missing
work and queues tasks. With an empty workspace there's nothing to audit,
so the loop never starts. tm-bootstrap bridges the gap with a single
claude -p call that turns goal.md into an initial plan + scaffold:
# 1. setup as Option B steps 1–2
cd ~/my-doudizhu
# 2. write goal.md (a paragraph or two is enough)
cat > goal.md <<'EOF'
做一个多 Agent 斗地主的 HTTP 小游戏,使用 Python。
- 三个 AI agent 轮流出牌
- 一个简单的 web 前端展示牌局
- 后端用 FastAPI + WebSocket 推送状态
EOF
# 3. bootstrap — generates plan.md + manifest.json + workspace/ scaffold
./bin/tm-bootstrap # ≈ $0.05–$0.20, 30–60s
./bin/tm-bootstrap --dry-run # preview without writing
./bin/tm-bootstrap --force # overwrite if you've run it before
# 4. review what it produced, then launch
$EDITOR plan.md # tweak ordering / wording if needed
./bin/tm-pm-up # click ▶ 启动 in the browsertm-bootstrap outputs:
plan.md— 8–12 ordered tasks (withsignal_cmds where verifiable)manifest.json— 4–6 audit dimensions tailored to THIS projectworkspace/<scaffold>— directory tree + minimal placeholder files (empty class shells, requirements.txt, basic README) so the first executor task has imports to satisfy
After this the agent team has cracks to fill, and the normal 0→done loop takes over.
Two launch flows depending on how you want to spawn agents:
# Dashboard-driven: PM + helpers + browser, NO agents. Click ▶ 启动 in
# the dashboard to spawn supervisor + 1 executor; use + executor / ■ 停止
# to scale up or wind down. Recommended for interactive sessions.
./bin/tm-pm-up
# One-shot: bring up everything immediately (agents start auto-go).
./bin/tm-team-up [N] # default: 1 executor; pass 2 or 3 for parallel
# Stopping
./bin/tm-pm shutdown "<reason>" # graceful (lets in-flight tasks finish)
./bin/tm-pm stop # force-stop daemon
./bin/tm-pm reset # wipe runtime state and start over
TM_TEAM_DOWN=1 ./bin/tm-pm-up # hard tear-down (PM + watchdog + gh-sync + agents)
TM_TEAM_DOWN=1 ./bin/tm-team-up # same; either worksBoth flows bring up the same four background helpers:
| Helper | Started by default | Skip with | Purpose |
|---|---|---|---|
| PM daemon | yes | — | Dispatches tasks to workers (zero-token state machine) |
| PM watchdog | yes | --no-watchdog |
Auto-restarts PM on crash (rate-limited 5/60s) |
| gh-sync loop | yes (auto-skip if no gh CLI) |
--no-gh-sync |
Mirrors escalations/ to GitHub Issues every 30 min |
| tm-web dashboard | yes | --no-dashboard |
Browser dashboard auto-opens at localhost:7891 |
tm-web (auto-launched by tm-pm-up / tm-team-up) is the primary control surface. Two tabs:
- 迭代项目 — Team Control card with
▶ 启动/+ executor/■ 停止buttons (drives the same_tm-pty-bg.pyspawn path thattm-team-upuses), live workers + tasks, click-to-view modal for status reports / decisions / vision proposals, recent PM events log. Topbar shows PM dot + Claude budget pills (5h / 7d). - Issue 管理 — gh-sync loop heartbeat with countdown to next sync, escalation→issue mapping (linkable to GitHub when remote is detected),
立即同步button, last 25 sync log entries.
POST endpoints (also usable from CLI / curl):
curl -X POST http://localhost:7891/api/team/start -d '{"executors": 2}'
curl -X POST http://localhost:7891/api/team/add-executor
curl -X POST http://localhost:7891/api/team/stop
curl -X POST http://localhost:7891/api/sync-now./bin/tm-pm watch # 1Hz terminal dashboard
./bin/tm-web # 1Hz browser dashboard at http://localhost:7891 (incl. Claude budget meter)
./bin/tm-pm status # one-shot snapshot
./bin/tm-pm tail # live PM event log
./bin/tm-status-report # generate markdown report under status-reports/
./bin/tm-status-report --stdout # print report to terminal
./bin/tm-risk-list # tasks at risk (reclaimed often / failing / stuck)
./bin/tm-pm escalations # permanent failures
./bin/tm-context list # all task statuses
./bin/tm-context done # done tasks with summaries
./bin/tm-decision list # supervisor's decision log
./bin/tm-github-sync # one-shot escalation → GitHub issue sync
./bin/tm-github-sync --dry-run # preview what would change
tail -f gh-sync.log # see periodic sync resultsEach Claude window's title bar shows live project state (workaround for Claude Code's statusLine refreshing only on agent turns):
● ZeroProgramer · 8/13 · ▶3 · 3/4w · sup
project-name · done/total · ▶in_progress · busy/total workers · role abbreviation
./bin/tm-claude-supervisor # the brain (meta-decisions)
./bin/tm-claude-planner # opt-in: separate planner Claude session (4-agent mode); skip in default 3-agent
./bin/tm-claude-executor # work (edits workspace/)Each wrapper auto-submits go, spawns a background title-keeper, cleans up on exit.
./bin/tm-decision new "<title>" "<context>" "<decision>" "<consequences>"
./bin/tm-decision list
./bin/tm-supervise revise-goal "<rationale>" # L6: snapshot goal.md before editing
./bin/tm-promote # L5: see workspace diff
./bin/tm-promote --apply pm-daemon.py # explicitly promote a single file
./bin/tm-goal-snapshot list # past goal.md snapshots
./bin/tm-goal-snapshot diff 1 # current vs latest snapshotYou can drop ambitious directions into the system at any moment:
# Append a one-liner to vision.md
./bin/tm-vision add "Web UI dashboard with real-time agent activity stream"
# Or open vision.md in your editor
./bin/tm-vision edit
# See current vision.md + any LLM-generated proposals
./bin/tm-vision list
# Have an LLM brainstorm 2 ambitious directions
./bin/tm-vision propose 2
# Inspect proposal #1
./bin/tm-vision show 1
# Wipe LLM proposals (vision.md untouched)
./bin/tm-vision clear-proposalsNo restart needed. The next supervisor cycle (and planner cycle if --with-planner) reads them. The supervisor's workflow includes "review vision/proposals → decide promote / defer".
vision.md is user-authored (like goal.md, not wiped by reset). proposals/ is agent-generated and is wiped by reset.
Every run leaves these artifacts:
project/
├── goal.md # current "done" criteria (you write)
├── vision.md # long-term ambitions (you write — innovation source)
├── status-reports/ # supervisor's periodic markdown reports
│ └── 2026-05-08T10-30Z.md
├── decisions/ # ADR-style decision log
│ └── 001-skip-promoting-tm-statusline.md
├── proposals/ # LLM-brainstormed ambitious directions awaiting supervisor review
│ └── 2026-05-08T11-15Z-web-dashboard.md
├── goal-history/ # snapshot of goal.md before each L6 revision
├── escalations/ # tasks that permanently failed (auto-mirrored to GitHub Issues)
├── .gh-issue-map.json # escalation file → live GitHub issue number (managed by tm-github-sync)
├── gh-sync.log # periodic-sync result log (one entry per 30 min)
├── workspace/ # the code executors actually edit
├── pm.log # PM event stream
├── supervisor.log # supervisor decisions
└── bin/.promoted-bak/ # backups from each L5 promotion
Anyone can later read status-reports for progress, decisions for "why did it do this", goal-history for direction evolution, escalations for what went wrong.
| Process | Job | Token cost (3-agent default) | Token cost (4-agent --with-planner) |
|---|---|---|---|
pm-daemon.py |
State machine dispatching | 0 | 0 |
tm-title-keeper (×N) |
Window title refresh | 0 | 0 |
tm-pm watch / tm-web / tail etc. |
Read-only monitoring | 0 | 0 |
| supervisor (Claude) | Patrol + decisions + planning | ~25% | ~5% |
| planner (Claude) | Cross-dimension search | (folded into supervisor) | ~25% |
| executor × N (Claude) | Actually editing code | ~75% | ~70% |
~75–99% of LLM budget goes to executors writing code — by design. 3-agent default saves a separate planner context (~25% lower steady-state cost) at the price of a denser supervisor; 4-agent splits load when supervisor saturates.
When tokens run out: planner / executor calls fail → escalations → supervisor sees the pattern → calls tm-supervise shutdown → PM exits gracefully. The system decides for itself when to stop.
ZeroProgramer keeps escalations in sync with GitHub Issues automatically. Whenever a task fails permanently and lands in escalations/task-NNNN.md, the linked GitHub issue is opened (or closed when the escalation is resolved). This gives you a familiar inbox for human follow-up without leaving the loop.
escalations/task-0042.md ←── auto-mirrored ──→ github.com/<owner>/<repo>/issues/127
tm-pm-up and tm-team-up (default mode) both start a background tm-gh-sync-loop alongside the PM daemon and watchdog. The loop calls tm-github-sync once at startup, then every TM_GH_SYNC_INTERVAL seconds (default 1800 = 30 min). Result: opening or closing escalations propagates to GitHub within ~30 min, no manual cron.
Auto-skip rules — the loop silently skips any iteration when:
ghCLI isn't onPATH(install from https://cli.github.com/)TM_GH_ENABLED=0is set (project is on GitLab / Bitbucket / not hosted)git remote get-url origindoesn't yield a GitHub URL
So safe to leave on for non-GitHub projects — it just no-ops every 30 min.
./bin/tm-github-sync # one-shot sync (what the loop calls)
./bin/tm-github-sync --dry-run # show what would change without doing it
./bin/tm-gh-sync-loop # foreground single-shot for debugging- Mapping file:
.gh-issue-map.jsonat repo root —task-NNNN.md → issue#. Wiped bytm-pm resetalong with other runtime state. - Loop log:
gh-sync.log— one line per pass, captures sync output and any auth/network errors. - PID file:
gh-sync.pid— used bytm-team-upteardown.
Currently one-way (escalations → issues). Coming work in vision.md: reverse direction (issue label tm-fix → automatic escalation), and broader git plumbing (auto-commit promoted workspace changes, branch-per-task, PR creation when goal.md ships). Tracked under "GitHub-native loop" in vision.md.
| Env | Default | Controls |
|---|---|---|
TM_GH_ENABLED |
1 |
Set to 0 to disable sync entirely (loop becomes no-op) |
TM_GH_REPO |
from git remote |
Override target repo, e.g. owner/repo |
TM_GH_LABEL |
tm-escalation |
GitHub label attached to opened issues |
TM_GH_SYNC_INTERVAL |
1800 |
Seconds between loop iterations |
Tasks are defined in plan.md using a simple text format:
1. First task
signal_cmd: make build
additional context on continuation lines
2. Second task (no signal_cmd required)
3. Third task
signal_cmd: pytest -q tests/
depends_on: [1, 2]
Grammar rules:
- Numbered items (
1.,2., ...) or bulleted items (-,*) start a new task - Blank lines separate task blocks
signal_cmd: <command>(indented) — optional shell command executed bytm-done; retried up toMAX_SIGNALtimes (default: 5) before task is marked failed and escalateddepends_on: <id-list>(indented) — optional dependency declaration; task waits until all parent tasks are done- Continuation lines (indented, no special prefix) extend the task title for documentation
Signal command behavior (strict mode + signal_cmd):
- Executor runs
tm-done "<summary>"→ PM checks if task has asignal_cmd - If yes,
tm-doneexecutes it and reports the exit code - Exit 0 → task marked done immediately (skips LLM review when in strict mode)
- Exit non-zero → task sent back to todo queue with history, re-assigned with feedback
- After
MAX_SIGNALconsecutive failures → task marked failed and escalated
| Env | Default | Controls |
|---|---|---|
PM_FOREVER |
0 | 1 = idle on empty queue, exit only on shutdown event |
PM_STRICT |
0 | 1 = run tm-review on every done event (or skip review for tasks with signal_cmd; see Strict mode below) |
PM_GOAL_REVIEW |
0 | 1 = run tm-goal-review before exit-on-all-done |
STALE_AFTER_SEC (constant) |
120 | Seconds without events before a worker is GC'd |
NAG_AFTER_SEC |
40 | Seconds of work before PM nags a worker |
TM_PLAN_CLEAN_COOLDOWN |
300 | After "clean" verdict, planner waits this long before re-evaluating |
SUPERVISE_INTERVAL |
600 | Seconds between supervisor patrol cycles |
TM_PROJECT_NAME |
(dir basename) | Override the project name shown in statusLine + window title (otherwise uses the directory name) |
TM_TITLE_INTERVAL |
2 | Title bar refresh frequency (s) |
PLANNER_INTERVAL |
60 | Pace for the bash-based tm-planner daemon |
TM_MODEL_VERIFIER |
(claude default) | Model for deterministic checks (tm-review, tm-goal-review). Recommend a cheap fast model: claude-haiku-4-5-20251001 |
TM_MODEL_CREATIVE |
(claude default) | Model for creative work (tm-profile, tm-assess, tm-vision, tm-plan, tm-auto-loop). Use the strong default unless you want to test a specific version. |
Set PM_STRICT=1 to enable automatic code review before marking tasks done:
export PM_STRICT=1
./bin/tm-team-up 2Strict mode review loop:
- Worker runs
tm-done "<summary>"with task complete - PM checks if task has a
signal_cmdfield:- If yes:
tm-donealready executed the signal_cmd and verified it exited 0 (or would re-queue); LLM review is skipped (cost optimization) - If no: PM spawns
tm-reviewto judge the worker's output (code quality, test coverage, docs completeness, etc.)
- If yes:
- Review verdict on each attempt:
- PASS → task marked done, moves to next task
- FAIL → task sent back to todo queue with reviewer feedback; re-assigned with context "
⚠️ previously failed review (N/3); address feedback" (where N is current attempt: 1, 2, or 3)
- Retry limit: After
MAX_REVIEWconsecutive review failures (default: 3 attempts) → task marked failed and escalated to escalations/ directory
Review retry counter examples:
- Attempt 1 fails → "
⚠️ previously failed review (1/3); address feedback" - Attempt 2 fails → "
⚠️ previously failed review (2/3); address feedback" - Attempt 3 fails → task marked failed, escalated, no more retries
- Any attempt passes → task done immediately, remaining attempts unused
Use PM_REVIEW_ALWAYS=1 (in strict mode) to force LLM review even for tasks with passing signal_cmd:
export PM_STRICT=1 PM_REVIEW_ALWAYS=1
./bin/tm-team-up 2This trades cost (2× token spend per signal_cmd task) for extra confidence when signal_cmd alone isn't sufficient.
Verifier calls (PASS/FAIL grading, DONE/CONTINUE judgment) are deterministic checks that don't need a top-tier model. Routing them to a cheap fast model can halve total token spend without quality loss:
# Cheap model for verification, default for creative work
export TM_MODEL_VERIFIER=claude-haiku-4-5-20251001
./bin/tm-team-up 2Why PM is Python, not Claude: PM's job (scan events, pair workers, update fields) is pure state machine — no LLM judgment needed. Running an LLM 4 times per second to do this would burn budget for no semantic value and introduce hallucination risk. LLMs earn their keep on creativity / judgment / context understanding — leave those to supervisor / planner / executor.
Why file mailbox, not socket / RPC: full decoupling. PM dies → workers can keep posting events for the next daemon. Worker crashes → PM auto-GCs and reclaims tasks. File IO is 10× simpler than network IPC and you can debug by ls events/.
Why role via TM_ROLE env var: each tm-claude-* wrapper sets TM_ROLE before spawning Claude. The SessionStart hook reads it from stdin JSON, includes it in the join event. PM stores per-worker role and only routes exec tasks to executors (skips supervisor/planner).
| Symptom | Action |
|---|---|
| One worker isn't doing anything | ./bin/tm-pm status — check role; if mis-registered, GC and re-open the window |
| Task stuck in-progress | ./bin/tm-context show <id> — check signal_history; likely a flaky signal_cmd |
| Worker count climbs unbounded | User opened too many windows / used /clear repeatedly; ./bin/tm-pm gc purges stale ones |
| PM crashed | Check tail of pm.log for traceback; ./bin/tm-pm start to restart |
| Want to start over | ./bin/tm-pm reset && ./bin/tm-team-up 2 |
| GitHub issues not updating | tail -20 gh-sync.log; auth: gh auth status; install: https://cli.github.com/ |
| Project isn't on GitHub | Set TM_GH_ENABLED=0 (or pass --no-gh-sync once) — sync loop becomes a no-op |
bin/
├── pm-daemon.py # background PM
├── tm-pm # PM control (start/stop/status/watch/...)
├── tm-pm-up # PM + helpers + browser, NO agents (dashboard-driven flow)
├── tm-team-up # one-shot: PM + helpers + agents (start everything immediately)
├── tm-bootstrap # one-shot 0→1: goal.md → plan.md + workspace scaffold
├── tm-init # install framework into target dir
├── tm-claude-supervisor # spawn supervisor window
├── tm-claude-planner # spawn planner window (opt-in, 4-agent mode only)
├── tm-claude-executor # spawn executor window
├── tm-launch-helpers # spawn planner + N executors (no supervisor) — legacy helper
├── tm-status-report # markdown weekly report
├── tm-decision # ADR decision log
├── tm-risk-list # risk register
├── tm-promote # L5 workspace → bin sync
├── tm-goal-snapshot # L6 goal.md history
├── tm-supervise # supervisor CLI (note/shutdown/revise-goal)
├── tm-plan-cycle # planner CLI (add/clean)
├── tm-done # executor CLI
├── tm-context # task / history queries
├── tm-vision # innovation channel: add / propose / list / show / edit
├── tm-github-sync # one-shot mirror of escalations → GitHub Issues
├── tm-gh-sync-loop # background daemon that calls tm-github-sync every 30 min
├── tm-web # browser dashboard (single-page, polls /api/state, shows Claude budget)
├── tm-dashboard # native tkinter dashboard (--native; experimental on macOS)
├── tm-title-keeper # live window-title refresher
├── tm-status-title # title bar text generator
├── tm-statusline # Claude Code statusLine command
├── tm-{session,prompt,stop,tool}-hook # Claude Code hooks
└── tm-{plan,review,assess,goal-review,profile} # one-shot claude -p tools
- Tutorial — end-to-end walkthrough from clone to first run (~20 min)
- Design Journey — why the architecture is what it is (decisions, lessons, deliberate non-features)
- Contributing — PR guidelines + architecture invariants
- README.zh.md — Chinese version
Built on Claude Code. Design inspired by agent-self-iteration (the previous-generation single-agent self-iterator from the same family).
License: MIT