Skip to content

Pin an explicit model per agent; stop floating on GH_AW_MODEL_AGENT_COPILOT #269

@norrietaylor

Description

@norrietaylor

Problem

Five agents run bare engine: copilot, so their model floats on GH_AW_MODEL_AGENT_COPILOT (default claude-sonnet-4.6, consumer-overridable): sdd-spec, sdd-triage, sdd-dispatch, sdd-validate, sdd-review. #248 showed why floating is a footgun: a consumer's model var silently resolved distillery-sync to opus and blew the per-run effective-token cap (429). Any unpinned agent is one consumer var away from a silent re-model — cost and behavior both change without a diff anywhere in this repo.

The model is also a per-agent fit question, not a global default. The pipeline's cost shape is asymmetric: triage runs once per feature and its errors multiply downstream (one bad decomposition = N bad task PRs, #252); execute/validate/review run per task.

Proposal

Pin engine: {id: copilot, model: ...} per agent (the literal compiles into the lock and wins over consumer vars, per the #248 precedent):

Agent Pin Rationale
sdd-triage claude-opus-4.6 Heaviest reasoner (architecture, task DAG, assumption ledger, sizing). Runs once per feature; errors fan out multiplicatively. Best place in the system to pay for the top tier.
sdd-spec claude-sonnet-4.6 Judgment-heavy authoring + the agile/full classifier, but bounded by human gates (spec PR review, /approve). Pin for classifier consistency across consumers.
sdd-validate claude-sonnet-4.6 Structured verification against explicit R-IDs/proof artifacts. Guards merge — too important for haiku, too structured for opus.
sdd-review claude-sonnet-4.6 One of three review nets (CodeRabbit, consumer CI).
sdd-dispatch claude-haiku-4.5 Judgment moved to composite actions (sdd-dispatch-compute, dedupe, promote-ready, cycle-detect); remaining agent work is orchestration. Same logic as the distillery-sync pin.
sdd-execute-{haiku,sonnet,opus} unchanged Already tier-bound via model:* task label; verify the opus tier tracks the newest opus the Copilot proxy offers.
distillery-sync unchanged Pinned claude-haiku-4.5 since #248.

Open questions

  1. Keep a consumer override anywhere? Recommendation: hard-pin the mechanical agents; honor GH_AW_MODEL_AGENT_COPILOT only on sdd-spec/sdd-review (where model taste plausibly varies per consumer), and document that the rest are pinned by design.
  2. SDD_AUTO_MERGE=1 repos: sdd-review is the last line before unattended merge — worth an opus override knob (e.g. SDD_REVIEW_MODEL) or just documentation?
  3. Confirm which model IDs the Copilot proxy currently serves before pinning (the opus tier may have a newer option than claude-opus-4.6).
  4. Cost telemetry: the ADR 0020 OTLP spans carry token usage per agent — worth a baseline read before/after to validate the triage-up/dispatch-down trade nets near zero.

Acceptance

  • Every engine:-bearing agent either pins an explicit model or documents why it floats.
  • A consumer setting GH_AW_MODEL_AGENT_COPILOT cannot re-model a hard-pinned agent (lock emits the literal).
  • docs/sdd/install.md documents the per-agent models and any remaining override surface.

Filed from a model-fit review of all agents; informed by the consumer pilot run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions