Skip to content

sdd-triage burns context to the effective-token rail: structural context reduction #271

@norrietaylor

Description

@norrietaylor

Problem

An sdd-triage phase-A run (consumer pilot run) failed on the AWF effective-token hard rail:

429 Maximum effective tokens exceeded (25096706.10 / 25000000)

It consumed 2.5M input tokens across 33 model API calls and produced zero safe output ({"items":[]}). Per-call context grew monotonically 33K → 122K tokens; the prompt cache broke mid-run (calls 29/31/32 reset cache_read to the 32K base), re-billing the full context uncached.

The invocation cap in #270 is a coarse backstop — it converts a silent rail-death into a recoverable noop, but does not stop context from growing. At any call count a pathological run still climbs toward the 25M rail. This issue tracks the structural cure: keep per-call context small enough that the rail is never approached.

Root cause

sdd-triage is a single 1070-line, 7-situation mega-workflow. The compiled prompt is ~103KB / ~33K base tokens and loads, on every run regardless of which event fired:

  • all three phases (A design / B plan / C materialize), though exactly one runs per invocation
  • six imported fragments including both heavy MCP fragments — sdd-mcp-serena.md (13.7KB) and sdd-mcp-distillery.md (7.7KB)
  • the full GitHub default toolset, whose tool definitions sit in context every turn

In the failing run, Serena was loaded but never called (0 tool calls; github=7, distillery=2). Its tool schemas were pure per-turn overhead.

Proposed work

A. Per-phase prompt + MCP scoping (biggest win)

Phases A/B/C are mutually exclusive per invocation, and the wrapper's sdd-route-triage action already resolves phase and item_number deterministically before the agent starts. Use that to load only the active phase's instructions and only the MCP servers that phase needs:

  • Serena is phase-A / phase-B-preview only — phase C (materialize) and the early-noop paths don't need it.
  • Distillery is phase-A / phase-B only.

Options to evaluate: conditional {{#if}} blocks keyed on the resolved phase; splitting into per-phase workflows; or sub-skills (progressive disclosure) so verbose phase playbooks load only when that phase runs. gh-aw loads MCP servers per-workflow, not per-phase, so MCP scoping likely requires the split or a capability-gated import.

B. Pre-fetch the resolved entity's reads (DataOps)

The failing run re-read the same tracking issue 4× via issue_read. The route action already knows item_number; a deterministic pre-step can materialize the tracking issue body + comments + sub-issue list into /tmp/gh-aw/ as compact JSON, so the agent reads a file instead of looping on API calls. Must handle the 7 situations' varied reads (arch PR diff, spec files, sub-issue tree) — design accordingly.

C. Tool-surface reduction

Evaluate tools.github: mode: gh-proxy to drop the GitHub MCP Docker server and its tool-definition block from every turn. Caveat: this workflow is sub-issue-tree heavy (get_sub_issues parent/child reads), which gh CLI only does via raw gh api — not a clean drop-in, and could increase turns. Likely belongs after (A), once per-phase scoping decides which reads each phase needs.

D. Cache hygiene

Investigate the mid-run cache break (calls 29/31/32). Ensure all static fragments strictly precede any dynamic/event content in the compiled prompt so the stable prefix stays cached across turns.

Non-goal: model downgrade

gh aw audit flags "downgrade to haiku" because it classifies this as generic triage. Phase A does real architecture synthesis — do not downgrade. The win is architecture/context, not model tier.

Acceptance

  • A phase-A run on a representative feature stays well under the 25M effective-token rail with comfortable headroom
  • Serena/Distillery load only on the phases that call them
  • Quality parity: architecture record, plan comment, and Unit/task tree match pre-change output on a validation issue
  • Back the change with an experiments: entry (metric: "aic") comparing before/after

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions