Skip to content

canesin/coder

Repository files navigation

coder

MCP server that orchestrates gemini, claude, and codex CLI agents across three composable pipelines: Develop, Research, and Design.

Each pipeline step is an independent machine — callable as a standalone MCP tool or composed into full workflows. An LLM host (Claude Code, Cursor, etc.) connects to the MCP server and drives the tools.

Prerequisites

Requirement Notes
Node.js >= 22 Runtime
gemini CLI Default agent for issue selection, plan review, committing
claude (Claude Code) Default agent for planning, implementation
codex CLI Default agent for code review, coalescing
gh CLI GitHub issue listing and PR creation (issueSource: "github")
glab CLI GitLab issue listing and MR creation (issueSource: "gitlab")

Agent role assignments are configurable — any role can use any of the three backends.

Install

From the latest GitHub release:

npm install -g https://github.com/canesin/coder/releases/download/v1.0.0/canesin-coder-1.0.0.tgz

Or from source:

git clone https://github.com/canesin/coder.git
cd coder
npm install
npm link

Quick start

As MCP server (primary interface)

Add to your MCP client config (.mcp.json, Claude Code settings, Cursor, etc.):

{
  "mcpServers": {
    "coder": {
      "command": "coder-mcp"
    }
  }
}

Or with explicit path (from source):

{
  "mcpServers": {
    "coder": {
      "command": "node",
      "args": ["./bin/coder-mcp.js"]
    }
  }
}

Or run directly:

coder-mcp                    # stdio (default)
coder-mcp --transport http   # HTTP on 127.0.0.1:8787/mcp

CLI (management)

coder status                  # workflow state and progress
coder status --watch          # refresh every 3s
coder events                  # stream structured log events
coder events --follow         # tail logs in real-time
coder cancel                  # cancel current workflow run
coder pause                   # pause at next checkpoint
coder resume                  # resume paused run
coder config                  # resolved configuration
coder steering generate       # create steering context
coder steering update         # refresh steering context
coder spec check <dir>        # validate spec directory structure
coder debug env               # show env vars agents receive
coder ppcommit                # commit hygiene (all files)
coder ppcommit --base main    # commit hygiene (branch diff only)
coder version                 # version, branch, and commit info
coder serve                   # start MCP server (delegates to coder-mcp)

Pipelines

Develop

Picks up issues from GitHub, GitLab, Linear, or a local manifest, implements code, and pushes PRs/MRs:

issue-list → issue-draft → planning ⇄ plan-review → implementation → quality-review → pr-creation
coder_workflow { action: "start", workflow: "develop" }

The develop pipeline can run in loop mode to process multiple issues autonomously. Loop state (queue, progress, heartbeat) is exposed via coder status and the MCP coder_status tool. Crash recovery via ensureCleanLoopStart handles dirty branches, stale state, and interrupted runs.

Research

Turns ideas into validated, reference-grounded issue backlogs:

context-gather → deep-research → tech-selection → poc-validation → issue-synthesis → issue-critique → issue-publish
coder_workflow { action: "start", workflow: "research", pointers: "..." }

Design

Generates UI designs from intent descriptions via Google Stitch:

intent-capture → ui-generation → ui-refinement → spec-export
coder_workflow { action: "start", workflow: "design", designIntent: "..." }

Spec build

Ingests existing spec documents, architects issue breakdowns, and renders publishable issue sets:

spec-ingest → spec-architect → spec-render

Machines are callable standalone via MCP tools (coder_research_spec_ingest, etc.) or composed via the spec-build workflow runner.

Validate spec directories before ingesting:

coder spec check <specDir>

Architecture

Machines

Every pipeline step is a machine defined with defineMachine():

defineMachine({ name, description, inputSchema, execute })

Machines are auto-registered as MCP tools (coder_develop_planning, coder_research_context_gather, etc.) and composable into pipelines via WorkflowRunner.

src/machines/
  develop/     7 machines
  research/   10 machines (7 pipeline + 3 spec-build)
  design/      4 machines
  shared/      2 reusable (web-research, poc-runner)

Agents

Three backends, assigned to roles via config:

Backend Class Use case
CLI CliAgent Complex tasks — planning, implementation, review
API ApiAgent Simple tasks — classification, JSON extraction
MCP McpAgent External MCP servers (Stitch)

AgentPool.getAgent(role, { scope, mode }) manages lifecycle and caching. Roles: issueSelector, planner, planReviewer, programmer, reviewer, committer, coalesce.

Agents include automatic retry with configurable backoff and hang detection. If a primary agent fails, an optional fallback agent can take over (configured via agents.fallback).

During develop implementation, the programmer CLI uses workflow.timeouts.implementation for both wall-clock and hang detection so a long quiet session is not cut off by the default agents.retry.hangTimeoutMs (5 minutes).

Workflow control

coder_workflow is the unified control plane:

Action Description
start Launch a pipeline run
status Current stage, heartbeat, loop state, progress
events Structured event log (afterSeq / limit). seq is the 1-based line index; run filtering can yield sparse pages — use allRuns: true for cross-run history.
reconcile If status shows a stale run (dead runner PID or heartbeat), mark the loop failed on disk so a new start is allowed.
pause Pause at next checkpoint
resume Resume paused run
cancel Cooperative cancellation

XState v5 models the lifecycle: idle → running → paused → completed/failed/cancelled/blocked.

State

All state lives under .coder/ (gitignored):

Path Purpose
workflow-state.json Per-issue step completion
loop-state.json Multi-issue develop queue, loop status, heartbeat
checkpoint-{runId}.json Pipeline step checkpoints per run
artifacts/ ISSUE.md, PLAN.md, PLANREVIEW.md, REVIEW_FINDINGS.md
backups/ Per-issue state snapshots for step-level resume across issues
steering/ Persistent project context (product.md, structure.md, tech.md)
scratchpad/ Research pipeline checkpoints
logs/*.jsonl Structured event logs (tagged with runId)
state.db Optional SQLite mirror

Configuration

Layered: ~/.config/coder/config.json (user) → coder.json (repo) → MCP tool inputs.

Claude, OpenRouter, and passEnv

  • models.claude is the single source for model, apiEndpoint, and which env var holds the API key (apiKeyEnv, e.g. OPENROUTER_API_KEY).
  • resolvePassEnv automatically adds every models.*.apiKeyEnv to the sandbox secret list, so you do not need to repeat OPENROUTER_API_KEY in sandbox.passEnv unless you use a fully custom passEnv array and want it explicit.
  • For OpenRouter-style endpoints (URL does not contain anthropic.com), the CLI sandbox gets ANTHROPIC_BASE_URL, ANTHROPIC_AUTH_TOKEN (from the key named in apiKeyEnv), and ANTHROPIC_API_KEY="" — derived from config, not from duplicating those names in passEnv.
  • Agent commands run with a non-login shell (bash -c), so ~/.profile / OpenRouter exports for your interactive terminal do not override models.claude in coder.json.
  • The model is still passed as claude --model … from models.claude.model; it is not set again via ANTHROPIC_MODEL in the environment.
{
  // Model selection (see coder.example.json for full structure)
  "models": {
    "gemini": { "model": "gemini-3-flash-preview" },
    "claude": { "model": "claude-sonnet-4-6" }
  },

  // Agent role assignments (gemini | claude | codex)
  "workflow": {
    "agentRoles": {
      "issueSelector": "gemini",
      "planner": "claude",
      "planReviewer": "gemini",
      "programmer": "claude",
      "reviewer": "codex",
      "committer": "gemini",
      "coalesce": "codex"
    },
    // Issue source: "github" (default), "linear", "gitlab", or "local"
    // github → gh CLI, gitlab → glab CLI, linear → Linear MCP, local → .coder/local-issues/
    "issueSource": "github",
    "localIssuesDir": ".coder/local-issues",
    "wip": { "push": true, "autoCommit": true },
    "maxPlanRevisions": 3,
    "resumeStepState": true,       // Preserve state/artifacts across issue retries (false = fresh start)
    "conflictDetection": true,     // Detect conflicts with open PR branches during planning
    "maxMachineRetries": 2,        // Max retries for Phase 3 machines (impl → review → PR)
    "retryBackoffMs": 5000,        // Backoff between Phase 3 retries
    // Pre-workflow health checks (tcp, command, or url)
    "preflight": {
      "checks": [
        { "type": "tcp", "host": "127.0.0.1", "port": 5432 },
        { "type": "command", "cmd": "docker ps" }
      ]
    },
    // Post-step hooks (shell commands triggered on workflow events)
    "hooks": [
      { "on": "machine_complete", "machine": "implementation", "run": "npm run lint" }
    ]
  },

  // Agent retry, hang detection, and fallback
  "agents": {
    "retry": {
      "retries": 1,
      "backoffMs": 5000,
      "retryOnRateLimit": true,
      "hangTimeoutMs": 300000
    },
    // Fallback agents when primary fails (role → agent name)
    "fallback": {}
  },

  // Commit hygiene (tree-sitter AST-based)
  // Presets: "strict" (default), "relaxed", "minimal"
  "ppcommit": {
    "preset": "strict",
    "enableLlm": true,
    "llmModelRef": "gemini"
  },

  // Test execution (setup/teardown hooks, health checks, timeouts)
  "test": {
    "command": "",
    "allowNoTests": false,
    "setup": [],
    "teardown": [],
    "healthCheck": null,
    "timeoutMs": 600000
  },

  // Design pipeline (requires Google Stitch)
  "design": {
    "stitch": { "enabled": false },
    "specDir": "spec/UI"
  }
}

See coder.example.json for a full example.

ppcommit

Built-in commit hygiene checker using tree-sitter AST analysis. Three presets control strictness:

Preset Description
strict All checks enabled (default)
relaxed Disables magic numbers, narration, new-markdown, and workflow artifact checks
minimal Only secrets and gitleaks — everything else off

Blocks (in strict mode):

  • Secrets and API keys (+ gitleaks integration)
  • TODO/FIXME comments
  • LLM narration markers (Here we..., Step 1:, etc.)
  • Emojis in code (not strings)
  • Magic numbers
  • Placeholder code and compat hacks
  • Over-engineering patterns
  • New markdown files outside allowed directories

Each check can be individually toggled (e.g., "blockMagicNumbers": false). Optional LLM-assisted checks via Gemini API for deeper analysis.

Steering context

Persistent project knowledge in .coder/steering/ that agents receive automatically:

coder steering generate   # scan repo, create product.md / structure.md / tech.md
coder steering update     # refresh after significant changes

Also available as MCP tools (coder_steering_generate, coder_steering_update) and the coder://steering MCP resource.

Hooks

User-defined shell commands triggered on workflow events. Configure in config.workflow.hooks[]:

{ "on": "machine_complete", "machine": "implementation", "run": "npm run lint" }

The machine field accepts a regex pattern for matching multiple machines.

Events: workflow_start, workflow_complete, workflow_failed, machine_start, machine_complete, machine_error, loop_start, loop_complete, issue_start, issue_complete, issue_failed, issue_skipped, issue_deferred.

Hook scripts receive CODER_HOOK_EVENT, CODER_HOOK_MACHINE, CODER_HOOK_STATUS, CODER_HOOK_DATA, and CODER_HOOK_RUN_ID environment variables. Failures are logged but never break the workflow.

Monitoring develop workflow (coder_status)

The MCP tool coder_status (and the same payload shape when embedded elsewhere) includes:

  • currentStage / activeAgent — coarse runner position from loop or lifecycle state. It can lag briefly right after a stage change.
  • steps and artifacts (issueExists, planExists, critiqueExists) — what exists on disk for the develop pipeline. Prefer these when you need to know whether ISSUE/PLAN/PLANREVIEW are present.
  • derivedArtifactPhase (when runStatus is running or paused and the active workflow is develop) — issue_draftplanningplan_reviewpast_plan_review, derived from steps plus artifact files. Omitted for research/design so stale develop artifacts do not mislabel the run. Use it when currentStage disagrees with artifacts (e.g. stage still develop_starting while planExists is true).

MCP tools are not shell commands — call coder_status through the MCP integration, not as a bash command name.

Plan review (Claude/Codex with sessions): If the agent exits 0 but PLANREVIEW.md is still missing and stripped stdout is empty, the runner logs critique_retry_empty_output, clears planReviewSessionId, and performs one more attempt in a fresh session (critique_retry_fresh_session) using a full retry prompt (read PLAN.md, same sections/constraints and revision-round note as the primary review). Nonzero exits and thrown errors log plan_review_execute_failed once per failed invocation (includes stdoutLen/stderrLen from the result or from the error when the sandbox attached streams). If the critique is still missing after that, see critique_missing_after_review in develop logs.

Safety

  • Workspace boundaries enforced — symlink escape detection on workspace and scratchpad paths
  • Non-destructive reset between issues (opt-in destructiveReset)
  • Crash recovery at loop start — WIP-commits known branches, resets stale state
  • Health-check URLs restricted to localhost
  • One active run per workspace (concurrent starts force-cancel previous)
  • Session TTL with automatic cleanup (HTTP mode)
  • Agent hang detection with configurable timeout (default 5 min); planning and plan review disable per-call hang so workflow.timeouts.planning / planReview bound silence instead
  • Codex runs inside the host sandbox with --dangerously-bypass-approvals-and-sandbox for Linux compatibility
  • CODER_ALLOW_ANY_WORKSPACE=1 to allow arbitrary paths
  • CODER_ALLOW_EXTERNAL_HEALTHCHECK=1 for external health-check URLs

Troubleshooting

  • Claude responses truncated / “max output tokens” — Increase claude.maxOutputTokens in coder.json, or set CLAUDE_CODE_MAX_OUTPUT_TOKENS in the host environment and include it in sandbox.passEnv (see coder.example.json). Same pattern as input: claude.maxInputTokens / CLAUDE_CODE_MAX_INPUT_TOKENS.

Environment variables

Variable Purpose
GEMINI_API_KEY / GOOGLE_API_KEY Gemini CLI + ppcommit LLM checks (auto-aliased)
ANTHROPIC_API_KEY Claude Code
CLAUDE_CODE_MAX_INPUT_TOKENS / CLAUDE_CODE_MAX_OUTPUT_TOKENS Claude Code context/output caps (optional; or set claude.maxInputTokens / claude.maxOutputTokens in coder.json)
OPENAI_API_KEY Codex CLI
GITHUB_TOKEN GitHub API (issues, PRs) — used by gh CLI
GITLAB_TOKEN / GITLAB_API_TOKEN GitLab API — used by glab CLI
LINEAR_API_KEY Linear issue tracking
GOOGLE_STITCH_API_KEY Design pipeline (Google Stitch)
CODER_ALLOW_ANY_WORKSPACE Allow arbitrary workspace paths (default: restricted)
CODER_ALLOW_EXTERNAL_HEALTHCHECK Allow non-localhost health-check URLs

Contributing

See CONTRIBUTING.md.

License

MIT

About

Multi-agent workflow orchestrator using all of gemini + claude + codex

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors