Skip to content

Design loop: verify file references, require acceptance tests for methodology claims#633

Merged
gnovak merged 1 commit into
devfrom
design-loop-verify-references
Jun 3, 2026
Merged

Design loop: verify file references, require acceptance tests for methodology claims#633
gnovak merged 1 commit into
devfrom
design-loop-verify-references

Conversation

@gnovak

@gnovak gnovak commented May 31, 2026

Copy link
Copy Markdown
Owner

Third PR in the trio addressing bridge-analysis PR #438's failure modes.

Why fix this here too

If the design hallucinates "the existing logic lives in module X" when it's actually in `notebooks/Y`, that error propagates: into the spec, into the implementation, eventually into a shipped stub. Catching it at the design stage is cheaper than catching it in resolve. The trio of PRs catches the failure mode at every layer it could be caught.

DEFAULT_SYSTEM_PROMPT additions

  • Verify every file/function reference before citing it. Use grep or read_file. If the canonical impl lives in a notebook, say so explicitly — don't pretend it's in a clean module.
  • Methodology claims need named acceptance tests. Any "uses BT+EB" / "applies BH-FDR" claim in the design must come with a proposed acceptance test in the analysis (e.g., `test_leaderboard_matches_bt_eb_reference_within_tolerance` with a tolerance value). Without it, the methodology is a label not a contract.

Test plan

  • 4 new tests in `TestDesignPromptFidelityRules`
  • 676 unit tests total pass
  • Empirical: next `/agent-design` or `/agent-delegate` should produce designs that name file paths precisely and include acceptance tests for methodology claims

🤖 Generated with Claude Code

…hodology claims

Third PR in the trio addressing bridge-analysis PR #438's failure modes.
PR #631 hardened resolve. PR #632 hardened spec generation. This PR
hardens the design loop — the earliest point in the pipeline where
methodology claims and file references first get written down.

If the design hallucinates "the existing logic lives in module X" when
it's actually in notebooks/Y, that error propagates: into the spec, into
the implementation, eventually into a shipped stub. Catching it at the
design stage is cheaper than catching it in resolve.

## DEFAULT_SYSTEM_PROMPT additions

- **Verify every file/function reference before citing it.** Use grep
  or read_file. If the canonical impl lives in a notebook, say so
  explicitly — don't pretend it's in a clean module.

- **Methodology claims need named acceptance tests.** Any "uses BT+EB"
  / "applies BH-FDR" claim in the design must come with a proposed
  acceptance test in the analysis (e.g.,
  test_leaderboard_matches_bt_eb_reference_within_tolerance with a
  tolerance value). Without it, the methodology is a label not a
  contract, and the implementer agent can ship a stand-in that the
  existing test suite won't catch.

## Tests

4 new tests in TestDesignPromptFidelityRules. 676 total pass.

Same diagnostic synthesis as PR #631 and #632 — derived from the
bridge-analysis agent's postmortem + my analysis. The three together
address the failure mode at every layer it could be caught: design
generation (this PR), spec generation (#632), implementation (#631).
@gnovak gnovak merged commit de72856 into dev Jun 3, 2026
@gnovak gnovak deleted the design-loop-verify-references branch June 13, 2026 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant