Problem
Five agents run bare engine: copilot, so their model floats on GH_AW_MODEL_AGENT_COPILOT (default claude-sonnet-4.6, consumer-overridable): sdd-spec, sdd-triage, sdd-dispatch, sdd-validate, sdd-review. #248 showed why floating is a footgun: a consumer's model var silently resolved distillery-sync to opus and blew the per-run effective-token cap (429). Any unpinned agent is one consumer var away from a silent re-model — cost and behavior both change without a diff anywhere in this repo.
The model is also a per-agent fit question, not a global default. The pipeline's cost shape is asymmetric: triage runs once per feature and its errors multiply downstream (one bad decomposition = N bad task PRs, #252); execute/validate/review run per task.
Proposal
Pin engine: {id: copilot, model: ...} per agent (the literal compiles into the lock and wins over consumer vars, per the #248 precedent):
| Agent |
Pin |
Rationale |
sdd-triage |
claude-opus-4.6 |
Heaviest reasoner (architecture, task DAG, assumption ledger, sizing). Runs once per feature; errors fan out multiplicatively. Best place in the system to pay for the top tier. |
sdd-spec |
claude-sonnet-4.6 |
Judgment-heavy authoring + the agile/full classifier, but bounded by human gates (spec PR review, /approve). Pin for classifier consistency across consumers. |
sdd-validate |
claude-sonnet-4.6 |
Structured verification against explicit R-IDs/proof artifacts. Guards merge — too important for haiku, too structured for opus. |
sdd-review |
claude-sonnet-4.6 |
One of three review nets (CodeRabbit, consumer CI). |
sdd-dispatch |
claude-haiku-4.5 |
Judgment moved to composite actions (sdd-dispatch-compute, dedupe, promote-ready, cycle-detect); remaining agent work is orchestration. Same logic as the distillery-sync pin. |
sdd-execute-{haiku,sonnet,opus} |
unchanged |
Already tier-bound via model:* task label; verify the opus tier tracks the newest opus the Copilot proxy offers. |
distillery-sync |
unchanged |
Pinned claude-haiku-4.5 since #248. |
Open questions
- Keep a consumer override anywhere? Recommendation: hard-pin the mechanical agents; honor
GH_AW_MODEL_AGENT_COPILOT only on sdd-spec/sdd-review (where model taste plausibly varies per consumer), and document that the rest are pinned by design.
SDD_AUTO_MERGE=1 repos: sdd-review is the last line before unattended merge — worth an opus override knob (e.g. SDD_REVIEW_MODEL) or just documentation?
- Confirm which model IDs the Copilot proxy currently serves before pinning (the opus tier may have a newer option than
claude-opus-4.6).
- Cost telemetry: the ADR 0020 OTLP spans carry token usage per agent — worth a baseline read before/after to validate the triage-up/dispatch-down trade nets near zero.
Acceptance
- Every
engine:-bearing agent either pins an explicit model or documents why it floats.
- A consumer setting
GH_AW_MODEL_AGENT_COPILOT cannot re-model a hard-pinned agent (lock emits the literal).
docs/sdd/install.md documents the per-agent models and any remaining override surface.
Filed from a model-fit review of all agents; informed by the consumer pilot run.
Problem
Five agents run bare
engine: copilot, so their model floats onGH_AW_MODEL_AGENT_COPILOT(defaultclaude-sonnet-4.6, consumer-overridable):sdd-spec,sdd-triage,sdd-dispatch,sdd-validate,sdd-review. #248 showed why floating is a footgun: a consumer's model var silently resolveddistillery-syncto opus and blew the per-run effective-token cap (429). Any unpinned agent is one consumer var away from a silent re-model — cost and behavior both change without a diff anywhere in this repo.The model is also a per-agent fit question, not a global default. The pipeline's cost shape is asymmetric: triage runs once per feature and its errors multiply downstream (one bad decomposition = N bad task PRs, #252); execute/validate/review run per task.
Proposal
Pin
engine: {id: copilot, model: ...}per agent (the literal compiles into the lock and wins over consumer vars, per the #248 precedent):sdd-triageclaude-opus-4.6sdd-specclaude-sonnet-4.6/approve). Pin for classifier consistency across consumers.sdd-validateclaude-sonnet-4.6sdd-reviewclaude-sonnet-4.6sdd-dispatchclaude-haiku-4.5sdd-dispatch-compute, dedupe, promote-ready, cycle-detect); remaining agent work is orchestration. Same logic as thedistillery-syncpin.sdd-execute-{haiku,sonnet,opus}model:*task label; verify the opus tier tracks the newest opus the Copilot proxy offers.distillery-syncclaude-haiku-4.5since #248.Open questions
GH_AW_MODEL_AGENT_COPILOTonly onsdd-spec/sdd-review(where model taste plausibly varies per consumer), and document that the rest are pinned by design.SDD_AUTO_MERGE=1repos:sdd-reviewis the last line before unattended merge — worth an opus override knob (e.g.SDD_REVIEW_MODEL) or just documentation?claude-opus-4.6).Acceptance
engine:-bearing agent either pins an explicit model or documents why it floats.GH_AW_MODEL_AGENT_COPILOTcannot re-model a hard-pinned agent (lock emits the literal).docs/sdd/install.mddocuments the per-agent models and any remaining override surface.Filed from a model-fit review of all agents; informed by the consumer pilot run.