Skip to content

feat(verify): structural verification gate with edit->test->fix loop …#19

Merged
wusijian007 merged 1 commit into
mainfrom
feat/m3.3-self-correction
Jun 16, 2026
Merged

feat(verify): structural verification gate with edit->test->fix loop …#19
wusijian007 merged 1 commit into
mainfrom
feat/m3.3-self-correction

Conversation

@wusijian007

Copy link
Copy Markdown
Owner

…(M3.3 / §4)

Third v3 milestone (§4 Self-Correction). Upgrades verification from model-opt-in (the verifier sub-agent, which the model may or may not spawn) to a structural loop gate. Includes the written §4 design section in docs/v3-kernel-roadmap.md (design-before-code per the roadmap's blast-radius tiering).

M3.3a -- verification gate (query.ts):
QueryOptions.verify = { command, args, when?, maxBounces? }. At the
completion path (the model emits no tool_uses -- "I'm done"), the loop
no longer returns immediately. It runs the verify command via
ToolContext.executor (the M2.1 seam, NOT the whitelisted Bash tool --
arbitrary npm test / tsc --noEmit is allowed). Exit 0 -> completes.
Non-zero -> the failure is injected as a reflective user turn
(reflectiveVerifyFailure: command + exit + truncated output + "locate
and fix; I'll re-run") and the loop continues an edit->test->fix cycle.
Bounded by maxBounces (default 2); exceeding it (or running out of
turns) ends with a verification_failed terminal state rather than a
silent completed. Each check yields a verification LoopEvent;
bounces also emit a query.verify_bounce profile mark.

M3.3b -- CLI + eval:
myagent agent --verify "<command>" parses into QueryOptions.verify
(whitespace split). The CLI prints a [verify] line per check. A 7th
eval task "self-correction" drives edit -> verify-fail -> fix ->
verify-pass through an injected scripted mock executor (exit codes
[1, 0]) -- fully deterministic, offline. The eval gate fingerprint
updated (tasks 7, turns 15, in 10800, out 625).

Determinism (invariant #2): the gate runs through ToolContext.executor, so the eval + the 4 new query-loop tests inject a mock CommandExecutor instead of spawning real processes. New TerminalState status "verification_failed" and a VerificationEvent added to the LoopEvent union.

finalize critic pass stays deferred to a §4 follow-up.

Local: 201 tests, 3/3 green.

…(M3.3 / §4)

Third v3 milestone (§4 Self-Correction). Upgrades verification from
model-opt-in (the verifier sub-agent, which the model may or may not
spawn) to a structural loop gate. Includes the written §4 design
section in docs/v3-kernel-roadmap.md (design-before-code per the
roadmap's blast-radius tiering).

M3.3a -- verification gate (query.ts):
  QueryOptions.verify = { command, args, when?, maxBounces? }. At the
  completion path (the model emits no tool_uses -- "I'm done"), the loop
  no longer returns immediately. It runs the verify command via
  ToolContext.executor (the M2.1 seam, NOT the whitelisted Bash tool --
  arbitrary `npm test` / `tsc --noEmit` is allowed). Exit 0 -> completes.
  Non-zero -> the failure is injected as a reflective user turn
  (reflectiveVerifyFailure: command + exit + truncated output + "locate
  and fix; I'll re-run") and the loop continues an edit->test->fix cycle.
  Bounded by maxBounces (default 2); exceeding it (or running out of
  turns) ends with a `verification_failed` terminal state rather than a
  silent `completed`. Each check yields a `verification` LoopEvent;
  bounces also emit a `query.verify_bounce` profile mark.

M3.3b -- CLI + eval:
  `myagent agent --verify "<command>"` parses into QueryOptions.verify
  (whitespace split). The CLI prints a `[verify]` line per check. A 7th
  eval task "self-correction" drives edit -> verify-fail -> fix ->
  verify-pass through an injected scripted mock executor (exit codes
  [1, 0]) -- fully deterministic, offline. The eval gate fingerprint
  updated (tasks 7, turns 15, in 10800, out 625).

Determinism (invariant #2): the gate runs through ToolContext.executor,
so the eval + the 4 new query-loop tests inject a mock CommandExecutor
instead of spawning real processes. New TerminalState status
"verification_failed" and a VerificationEvent added to the LoopEvent
union.

finalize critic pass stays deferred to a §4 follow-up.

Local: 201 tests, 3/3 green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wusijian007 wusijian007 merged commit 5df909d into main Jun 16, 2026
3 checks passed
@wusijian007 wusijian007 deleted the feat/m3.3-self-correction branch June 16, 2026 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant