sdd-triage burns context to the effective-token rail: structural context reduction

## Problem

An `sdd-triage` phase-A run (consumer pilot run) failed on the AWF effective-token hard rail:

```
429 Maximum effective tokens exceeded (25096706.10 / 25000000)
```

It consumed 2.5M input tokens across 33 model API calls and produced **zero safe output** (`{"items":[]}`). Per-call context grew monotonically 33K → 122K tokens; the prompt cache broke mid-run (calls 29/31/32 reset `cache_read` to the 32K base), re-billing the full context uncached.

The invocation cap in #270 is a coarse backstop — it converts a silent rail-death into a recoverable `noop`, but **does not stop context from growing**. At any call count a pathological run still climbs toward the 25M rail. This issue tracks the structural cure: keep per-call context small enough that the rail is never approached.

## Root cause

`sdd-triage` is a single 1070-line, 7-situation mega-workflow. The compiled prompt is ~103KB / ~33K base tokens and loads, **on every run regardless of which event fired**:

- all three phases (A design / B plan / C materialize), though exactly one runs per invocation
- six imported fragments including both heavy MCP fragments — `sdd-mcp-serena.md` (13.7KB) and `sdd-mcp-distillery.md` (7.7KB)
- the full GitHub `default` toolset, whose tool definitions sit in context every turn

In the failing run, **Serena was loaded but never called** (0 tool calls; github=7, distillery=2). Its tool schemas were pure per-turn overhead.

## Proposed work

### A. Per-phase prompt + MCP scoping (biggest win)

Phases A/B/C are mutually exclusive per invocation, and the wrapper's `sdd-route-triage` action already resolves `phase` and `item_number` deterministically before the agent starts. Use that to load only the active phase's instructions and only the MCP servers that phase needs:

- Serena is phase-A / phase-B-preview only — phase C (materialize) and the early-`noop` paths don't need it.
- Distillery is phase-A / phase-B only.

Options to evaluate: conditional `{{#if}}` blocks keyed on the resolved phase; splitting into per-phase workflows; or sub-skills (progressive disclosure) so verbose phase playbooks load only when that phase runs. gh-aw loads MCP servers per-workflow, not per-phase, so MCP scoping likely requires the split or a capability-gated import.

### B. Pre-fetch the resolved entity's reads (DataOps)

The failing run re-read the same tracking issue 4× via `issue_read`. The route action already knows `item_number`; a deterministic pre-step can materialize the tracking issue body + comments + sub-issue list into `/tmp/gh-aw/` as compact JSON, so the agent reads a file instead of looping on API calls. Must handle the 7 situations' varied reads (arch PR diff, spec files, sub-issue tree) — design accordingly.

### C. Tool-surface reduction

Evaluate `tools.github: mode: gh-proxy` to drop the GitHub MCP Docker server and its tool-definition block from every turn. **Caveat:** this workflow is sub-issue-tree heavy (`get_sub_issues` parent/child reads), which `gh` CLI only does via raw `gh api` — not a clean drop-in, and could increase turns. Likely belongs *after* (A), once per-phase scoping decides which reads each phase needs.

### D. Cache hygiene

Investigate the mid-run cache break (calls 29/31/32). Ensure all static fragments strictly precede any dynamic/event content in the compiled prompt so the stable prefix stays cached across turns.

## Non-goal: model downgrade

`gh aw audit` flags "downgrade to haiku" because it classifies this as generic triage. Phase A does real architecture synthesis — **do not downgrade**. The win is architecture/context, not model tier.

## Acceptance

- A phase-A run on a representative feature stays well under the 25M effective-token rail with comfortable headroom
- Serena/Distillery load only on the phases that call them
- Quality parity: architecture record, plan comment, and Unit/task tree match pre-change output on a validation issue
- Back the change with an `experiments:` entry (`metric: "aic"`) comparing before/after

## References

- Backstop PR: #270
- Hard rail config: compiled lock `apiProxy.maxEffectiveTokens` = 25000000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdd-triage burns context to the effective-token rail: structural context reduction #271

Problem

Root cause

Proposed work

A. Per-phase prompt + MCP scoping (biggest win)

B. Pre-fetch the resolved entity's reads (DataOps)

C. Tool-surface reduction

D. Cache hygiene

Non-goal: model downgrade

Acceptance

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

sdd-triage burns context to the effective-token rail: structural context reduction #271

Description

Problem

Root cause

Proposed work

A. Per-phase prompt + MCP scoping (biggest win)

B. Pre-fetch the resolved entity's reads (DataOps)

C. Tool-surface reduction

D. Cache hygiene

Non-goal: model downgrade

Acceptance

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions