A set of agents, skills, and rules for disciplined AI-assisted development with Claude Code.
The main instruction file loaded by Claude Code in every session.
It should stay small and define only the core behavioural constraints:
- Think before coding — state assumptions explicitly and surface ambiguities.
- Simplicity first — write the minimum code that solves the problem.
- Surgical changes — touch only what the task requires.
- Goal-driven execution — define success criteria and verify results.
Workflow-specific guidance should live in rules/ and skills/, not directly in CLAUDE.md.
Custom subagent definitions. Each agent has a narrow, well-defined role:
| Agent | Role |
|---|---|
planner |
Converts an approved task/spec into a deterministic plan.md |
implementer |
Executes an approved plan.md with minimal changes |
reviewer |
Reviews code changes against the plan, spec, and delivery artifacts |
antagonist |
Adversarial pre-PR reviewer: looks for blockers, hidden risks, weak validation, unsafe rollout, and scope creep |
tdd-guide |
Enforces test-first workflow: red → green → refactor |
tech-lead |
Research and planning only — never modifies code |
Slash commands that orchestrate multi-step workflows:
| Skill | Command | What it does |
|---|---|---|
sdd-start |
/sdd-start |
Creates an intent-level spec.md v0 from a rough requirement, using AskUserQuestionTool to resolve blocking ambiguities |
sdd-research |
/sdd-research |
Inspects the repository against spec.md and writes research.md with concrete code evidence |
sdd-refine |
/sdd-refine |
Refines spec.md using research.md, producing the planning-ready version |
sdd-delivery-artifacts |
/sdd-delivery-artifacts |
Creates delivery_artifacts/*.md, a feature-specific map of what must be produced or modified |
sdd-facts |
/sdd-facts |
Creates facts/*.md, executable or verifiable assertions that prove important requirements |
execute-plan |
/execute-plan |
Reads plan.md and drives every task to completion using a TDD loop, parallelizing disjoint tasks when safe |
deep-spec-review |
/deep-spec-review |
Runs specialist reviewers in parallel — security, cost, compliance, ops, architecture — and conducts structured Q&A before edits |
architecture-decision-records |
/architecture-decision-records |
Captures architectural decisions as structured ADR documents in docs/adr/ |
spec-review |
/spec-review |
Lightweight spec review |
Domain-specific standards and workflow rules loaded on demand:
rules/
├── common/ # Coding style, testing, security, patterns, hooks, devkit, jt-linter
├── rails/ # Rails-specific style, testing, security, patterns, hooks
├── terraform/ # Terraform, DynamoDB, IAM and infrastructure rules
└── sdd/ # Spec-Driven Development workflow and artifact rules
Claude Code project configuration — permissions allowlist, model selection, and token limits.
There are two main operating modes.
Use this for trivial or mechanical work:
- Plan — invoke the
planneragent with a task description. - Execute — run
/execute-plan. - Review — invoke the
revieweragent.
This mode is appropriate for:
- typo fixes
- small copy changes
- obvious one-line configuration changes
- mechanical renames with no behavioural impact
Use SDD for non-trivial feature work, ambiguous requirements, behaviour changes, API changes, background jobs, observability changes, infrastructure changes, or anything where implementation should not start from a vague prompt.
The SDD workflow is:
/sdd-start
→ spec.md v0
/sdd-research
→ research.md
/sdd-refine
→ spec.md v1
/sdd-delivery-artifacts
→ delivery_artifacts/*.md
/sdd-facts
→ facts/*.md
planner
→ plan.md + plan/tN.md
/execute-plan
→ code changes + validation + tracking updates
reviewer
→ final review against spec, research, delivery artifacts, facts, and plan
antagonist
→ adversarial pre-PR review for blockers, hidden risk, weak validation, unsafe rollout, and scope creep
The goal is to avoid jumping from a rough requirement directly into implementation.
Create a playbook directory for the feature:
doc/playbook/<YYYYMMDD>_<feature_slug>/
Example:
doc/playbook/20260519_provider_output_observability/
Run:
/sdd-start doc/playbook/<feature>
This creates:
spec.md
At this stage, spec.md is an intent-level spec. It should describe:
- problem
- goal
- non-goals
- users or consumers
- desired behaviour
- EARS requirements
- acceptance scenarios
- observability expectations
- compatibility concerns
- research questions
sdd-start must not claim facts about the current codebase unless the user explicitly provided them.
If current behaviour depends on repository inspection, the spec should say:
Unknown. Must be discovered during /sdd-research.
Run:
/sdd-research doc/playbook/<feature>
This creates:
research.md
research.md is the source of truth for repository evidence. It should record:
- relevant files
- current flows
- existing tests
- contracts and external interfaces
- operational surfaces
- risks
- unknowns
- planning inputs
It must not create a plan or modify code.
Run:
/sdd-refine doc/playbook/<feature>
This updates spec.md using the evidence from research.md.
After this step, spec.md becomes the planning-ready version.
The refined spec should reconcile:
user intent + repository evidence
If research conflicts with the original intent, the conflict must be surfaced explicitly. The spec must not silently change product intent just because the current code makes something easier.
Run:
/sdd-delivery-artifacts doc/playbook/<feature>
This creates:
delivery_artifacts/*.md
delivery_artifacts/ is a variable, feature-specific map of what must be produced or modified.
It is not a fixed category list.
Examples of valid delivery artifact files:
delivery_artifacts/
├── 01-api-contracts.md
├── 02-domain-model.md
├── 03-jobs-consumers.md
└── 04-observability.md
or:
delivery_artifacts/
├── 01-grafana-dashboard.md
├── 02-terraform-alerts.md
└── 03-runbook.md
The actual files must be inferred from the feature.
Tests are not delivery artifacts. Tests belong in the plan and task validation sections.
Run:
/sdd-facts doc/playbook/<feature>
This creates:
facts/*.md
Facts are executable or verifiable assertions.
They exist because specs explain intent, but facts prove behaviour.
A fact should link back to one or more REQ-* requirements and eventually be proven by a deterministic check, such as:
- test
- contract check
- schema validation
- smoke test
- static analysis
- infrastructure validation
- CI job
Facts help reviewers answer:
Does this implementation actually prove the important behaviour, or does it merely look complete?
Invoke the planner agent only after these exist:
spec.md
research.md
delivery_artifacts/*.md
facts/*.md
The planner must read:
spec.mdresearch.md- every markdown file under
delivery_artifacts/ - every markdown file under
facts/ - referenced ADRs
- relevant rules
The planner produces:
plan.md
plan/t1.md
plan/t2.md
...
The plan must cover:
- every concrete artifact listed under
delivery_artifacts/*.md - every
@specfact listed underfacts/*.md
Each task should include:
DeliversImplements Facts- allowed files
- implementation notes
- validation/tests
- done criteria
Run:
/execute-plan doc/playbook/<feature>
Execution must follow the approved plan and modify only files allowed by the relevant task.
Validation must use the applicable project, stack, and common rules.
A task is not complete when code changes are implemented. A task is complete only when implementation, validation, and SDD tracking updates are all done.
After each verified task, /execute-plan must update the relevant tracking files when applicable:
plan.mdplan/tN.mddelivery_artifacts/*.mdfacts/*.md
Invoke the reviewer agent.
The review should check the diff against:
- refined
spec.md research.mddelivery_artifacts/*.mdfacts/*.mdplan.md- completed
plan/tN.mdtasks
The reviewer should verify that:
- all delivery artifacts were produced or explicitly deferred
- facts marked
@implementedhave executable checks that exist and passed - completed tasks match the actual diff
Invoke the antagonist agent when the work is ready to become a PR.
The antagonist is not a normal reviewer.
The reviewer asks:
Is this implementation correct against the plan?
The antagonist asks:
Why should this not be merged yet?
Use it especially for L1+ and L2 work:
- production-impacting changes
- migrations
- observability changes
- dual-run systems
- dependency removal
- cross-service changes
- queues, jobs, or consumers
- personal data
- permissions, authentication, or authorization
- rollout or rollback risk
The antagonist should look for:
- unmet requirements
- facts that do not really prove the requirement
- delivery artifacts marked complete without evidence
- unsafe rollout
- missing rollback or reversibility
- staging/production mismatch
- hidden coupling
- premature dependency removal
- operational blind spots
- scope creep
The output should be sharp and adversarial:
BLOCK | CAUTION | PASS
The goal is to raise red flags before GitHub review, especially when AI-assisted implementation reduced the human owner's cognitive load.
Specs should use EARS for behavioural requirements.
Each important requirement should:
- have a stable id:
REQ-001,REQ-002, ... - use
shall - be atomic
- be testable or verifiable
- avoid vague words like "properly", "easy", "fast", or "robust" unless quantified
Preferred EARS patterns:
The system shall <response>.
When <trigger>, the system shall <response>.
While <state>, the system shall <response>.
Where <feature or condition>, the system shall <response>.
If <unwanted event>, then the system shall <response>.
Example:
REQ-001:
When a provider output attempt is made, the system shall record the attempt grouped by channel.
REQ-002:
When recording provider output metrics, the system shall not include user-level identifiers as metric labels.
REQ-003:
If the provider output attempt fails, then the system shall preserve the existing retry behaviour.
Acceptance scenarios may use Given/When/Then, but they do not replace EARS requirements.
Facts turn important requirements into executable or verifiable assertions.
The short version:
Specs explain intent.
Facts prove behaviour.
A fact should usually reference one or more REQ-* requirements.
Example:
FACT-001:
When provider output is emitted, the system records both Datadog and OpenTelemetry signals during the dual-run phase.
Requirement:
- REQ-001
Executable check:
- resolved from project rules
Facts should not be marked @implemented unless the executable check exists and passed.
| Artifact | Responsibility |
|---|---|
spec.md |
Behavioural source of truth |
research.md |
Repository-evidence source of truth |
delivery_artifacts/*.md |
Production-scope source of truth |
facts/*.md |
Executable-verification source of truth |
plan.md |
Execution source of truth |
plan/tN.md |
Atomic task instructions |
| tests/contracts/schemas/ADRs/dashboards/alerts/code | Permanent artifacts |
Imagine we want to add a small feature:
When a user receives a very boring notification, the system should add a tiny fun fact to make it less depressing.
Start with a rough prompt:
Help me write the specification for adding a tiny fun fact to boring notifications.
The goal is to make low-priority notifications feel a bit more human without changing critical or legal messages.
Then run the flow:
/sdd-start doc/playbook/20260601_fun_fact_notifications
Claude should ask questions if the requirement is unclear, then create spec.md.
Next:
/sdd-research doc/playbook/20260601_fun_fact_notifications
This checks the repo and answers questions like:
- where notifications are built
- whether notification priority already exists
- whether legal/critical messages can be detected
- where tests already live
Then:
/sdd-refine doc/playbook/20260601_fun_fact_notifications
This turns the original intent into a repo-aware spec.
Then:
/sdd-delivery-artifacts doc/playbook/20260601_fun_fact_notifications
This lists what must be produced, for example:
delivery_artifacts/
├── 01-notification-content.md
└── 02-safety-rules.md
Then:
/sdd-facts doc/playbook/20260601_fun_fact_notifications
This defines what must be proven, for example:
FACT-001: boring low-priority notifications may receive a fun fact
FACT-002: critical/legal notifications must never receive a fun fact
Then invoke the planner:
Use the planner agent for doc/playbook/20260601_fun_fact_notifications
The planner creates:
plan.md
plan/
├── t1.md
├── t2.md
└── t3.md
Then execute:
/execute-plan doc/playbook/20260601_fun_fact_notifications
After implementation, run the normal review:
Use the reviewer agent for doc/playbook/20260601_fun_fact_notifications
And before opening the PR, run the adversarial review:
Use the antagonist agent for doc/playbook/20260601_fun_fact_notifications.
Assume this may be subtly wrong or unsafe.
Find reasons this should not be merged yet.
The antagonist might flag something like:
BLOCK: The spec says legal notifications must never receive fun facts,
but there is no fact proving that legal messages are excluded.
That is the point of the workflow: not to generate more documents for fun, but to catch the thing we would otherwise miss.
This workflow was influenced by these articles on Spec-Driven Development and Facts:
- Stop Writing Specs. Start Writing Facts. The Entire SDD Movement Is Already Obsolete
- Comparing 15 Spec-Driven Development Frameworks
Forked and adapted from: