Problem
An sdd-triage phase-A run (consumer pilot run) failed on the AWF effective-token hard rail:
429 Maximum effective tokens exceeded (25096706.10 / 25000000)
It consumed 2.5M input tokens across 33 model API calls and produced zero safe output ({"items":[]}). Per-call context grew monotonically 33K → 122K tokens; the prompt cache broke mid-run (calls 29/31/32 reset cache_read to the 32K base), re-billing the full context uncached.
The invocation cap in #270 is a coarse backstop — it converts a silent rail-death into a recoverable noop, but does not stop context from growing. At any call count a pathological run still climbs toward the 25M rail. This issue tracks the structural cure: keep per-call context small enough that the rail is never approached.
Root cause
sdd-triage is a single 1070-line, 7-situation mega-workflow. The compiled prompt is ~103KB / ~33K base tokens and loads, on every run regardless of which event fired:
- all three phases (A design / B plan / C materialize), though exactly one runs per invocation
- six imported fragments including both heavy MCP fragments —
sdd-mcp-serena.md (13.7KB) and sdd-mcp-distillery.md (7.7KB)
- the full GitHub
default toolset, whose tool definitions sit in context every turn
In the failing run, Serena was loaded but never called (0 tool calls; github=7, distillery=2). Its tool schemas were pure per-turn overhead.
Proposed work
A. Per-phase prompt + MCP scoping (biggest win)
Phases A/B/C are mutually exclusive per invocation, and the wrapper's sdd-route-triage action already resolves phase and item_number deterministically before the agent starts. Use that to load only the active phase's instructions and only the MCP servers that phase needs:
- Serena is phase-A / phase-B-preview only — phase C (materialize) and the early-
noop paths don't need it.
- Distillery is phase-A / phase-B only.
Options to evaluate: conditional {{#if}} blocks keyed on the resolved phase; splitting into per-phase workflows; or sub-skills (progressive disclosure) so verbose phase playbooks load only when that phase runs. gh-aw loads MCP servers per-workflow, not per-phase, so MCP scoping likely requires the split or a capability-gated import.
B. Pre-fetch the resolved entity's reads (DataOps)
The failing run re-read the same tracking issue 4× via issue_read. The route action already knows item_number; a deterministic pre-step can materialize the tracking issue body + comments + sub-issue list into /tmp/gh-aw/ as compact JSON, so the agent reads a file instead of looping on API calls. Must handle the 7 situations' varied reads (arch PR diff, spec files, sub-issue tree) — design accordingly.
C. Tool-surface reduction
Evaluate tools.github: mode: gh-proxy to drop the GitHub MCP Docker server and its tool-definition block from every turn. Caveat: this workflow is sub-issue-tree heavy (get_sub_issues parent/child reads), which gh CLI only does via raw gh api — not a clean drop-in, and could increase turns. Likely belongs after (A), once per-phase scoping decides which reads each phase needs.
D. Cache hygiene
Investigate the mid-run cache break (calls 29/31/32). Ensure all static fragments strictly precede any dynamic/event content in the compiled prompt so the stable prefix stays cached across turns.
Non-goal: model downgrade
gh aw audit flags "downgrade to haiku" because it classifies this as generic triage. Phase A does real architecture synthesis — do not downgrade. The win is architecture/context, not model tier.
Acceptance
- A phase-A run on a representative feature stays well under the 25M effective-token rail with comfortable headroom
- Serena/Distillery load only on the phases that call them
- Quality parity: architecture record, plan comment, and Unit/task tree match pre-change output on a validation issue
- Back the change with an
experiments: entry (metric: "aic") comparing before/after
References
Problem
An
sdd-triagephase-A run (consumer pilot run) failed on the AWF effective-token hard rail:It consumed 2.5M input tokens across 33 model API calls and produced zero safe output (
{"items":[]}). Per-call context grew monotonically 33K → 122K tokens; the prompt cache broke mid-run (calls 29/31/32 resetcache_readto the 32K base), re-billing the full context uncached.The invocation cap in #270 is a coarse backstop — it converts a silent rail-death into a recoverable
noop, but does not stop context from growing. At any call count a pathological run still climbs toward the 25M rail. This issue tracks the structural cure: keep per-call context small enough that the rail is never approached.Root cause
sdd-triageis a single 1070-line, 7-situation mega-workflow. The compiled prompt is ~103KB / ~33K base tokens and loads, on every run regardless of which event fired:sdd-mcp-serena.md(13.7KB) andsdd-mcp-distillery.md(7.7KB)defaulttoolset, whose tool definitions sit in context every turnIn the failing run, Serena was loaded but never called (0 tool calls; github=7, distillery=2). Its tool schemas were pure per-turn overhead.
Proposed work
A. Per-phase prompt + MCP scoping (biggest win)
Phases A/B/C are mutually exclusive per invocation, and the wrapper's
sdd-route-triageaction already resolvesphaseanditem_numberdeterministically before the agent starts. Use that to load only the active phase's instructions and only the MCP servers that phase needs:nooppaths don't need it.Options to evaluate: conditional
{{#if}}blocks keyed on the resolved phase; splitting into per-phase workflows; or sub-skills (progressive disclosure) so verbose phase playbooks load only when that phase runs. gh-aw loads MCP servers per-workflow, not per-phase, so MCP scoping likely requires the split or a capability-gated import.B. Pre-fetch the resolved entity's reads (DataOps)
The failing run re-read the same tracking issue 4× via
issue_read. The route action already knowsitem_number; a deterministic pre-step can materialize the tracking issue body + comments + sub-issue list into/tmp/gh-aw/as compact JSON, so the agent reads a file instead of looping on API calls. Must handle the 7 situations' varied reads (arch PR diff, spec files, sub-issue tree) — design accordingly.C. Tool-surface reduction
Evaluate
tools.github: mode: gh-proxyto drop the GitHub MCP Docker server and its tool-definition block from every turn. Caveat: this workflow is sub-issue-tree heavy (get_sub_issuesparent/child reads), whichghCLI only does via rawgh api— not a clean drop-in, and could increase turns. Likely belongs after (A), once per-phase scoping decides which reads each phase needs.D. Cache hygiene
Investigate the mid-run cache break (calls 29/31/32). Ensure all static fragments strictly precede any dynamic/event content in the compiled prompt so the stable prefix stays cached across turns.
Non-goal: model downgrade
gh aw auditflags "downgrade to haiku" because it classifies this as generic triage. Phase A does real architecture synthesis — do not downgrade. The win is architecture/context, not model tier.Acceptance
experiments:entry (metric: "aic") comparing before/afterReferences
apiProxy.maxEffectiveTokens= 25000000