From e02bd05bc9662e570ff4ec5fc35307679a601d02 Mon Sep 17 00:00:00 2001 From: Brian O'Kelley Date: Sat, 6 Jun 2026 12:29:15 -0400 Subject: [PATCH] chore: strengthen Argus review checks --- .agents/roles/code-reviewer.md | 13 +++++ .claude/agents/code-reviewer.md | 13 +++++ .github/ai-review/expert-adcp-reviewer.md | 64 ++++++++++++++++------- .github/pull_request_template.md | 3 ++ 4 files changed, 75 insertions(+), 18 deletions(-) diff --git a/.agents/roles/code-reviewer.md b/.agents/roles/code-reviewer.md index 10f34a7..f1c26af 100644 --- a/.agents/roles/code-reviewer.md +++ b/.agents/roles/code-reviewer.md @@ -19,6 +19,19 @@ You are a code reviewer for the AdCP monorepo. You review changes for: - Are there any TODO / FIXME / `console.log` / debug prints left in? - Does the PR title follow conventional-commits format? +## High-risk checks + +- **Executable integration path:** For new builders, transport adapters, servlet bridges, connection managers, background workers, subscriptions, or third-party SDK integrations, identify the highest-level new entry point and verify that existing tests exercise it with the real downstream lifecycle components. If tests mock the queue/thread/subscription/lifecycle owner, call out the gap. For methods named like `start`, `ensureStarted`, `close`, `subscribe`, `flush`, `shutdown`, `request`, `complete`, or `cancel`, verify actual behavior from in-repo source, cited docs, or a real integration/manual smoke test; do not infer from the method name. If a pinned pre-1.0 dependency implementation is not available in repo context, require an in-repo integration test or PR-described manual smoke proof across the real lifecycle boundary. +- **Dispatch-key mutation:** Search for `substring`, `replace`, `replaceAll`, `trim`, `toLowerCase`, normalization helpers, capping, and fallback defaults applied to routing or security keys before dispatch. Tool names, route names, skill IDs, JSON-RPC methods, task IDs, message IDs, auth principals, tenant IDs, schema paths, and cache keys should usually be rejected when invalid, not silently truncated or stripped. +- **Cross-side symmetry:** If a caller validates a value but the server accepts-and-mutates it, or one transport rejects while another normalizes, flag the mismatch. +- **Workspace artifacts:** Flag committed local coordination files such as `.wt-*`, `.context/**`, `.DS_Store`, editor swap files, temp claim files, local logs, and scratchpads unless the PR documents why they are source. + +Severity defaults: + +- Pinned or pre-1.0 third-party lifecycle method relied on without verifying the implementation -> Blocker if it can hang, drop work, or leave a dead worker. +- Public dispatch-key mutation before serving, routing, lookup, auth, cache-key construction, or persistence -> Blocker. +- Committed workspace artifacts (`.wt-claim`, `.wt-*`, `.context/**`, `.local/**`) -> Blocker unless the PR explicitly documents why they are source. + ## How to report back Structured bullet list: diff --git a/.claude/agents/code-reviewer.md b/.claude/agents/code-reviewer.md index 10f34a7..f1c26af 100644 --- a/.claude/agents/code-reviewer.md +++ b/.claude/agents/code-reviewer.md @@ -19,6 +19,19 @@ You are a code reviewer for the AdCP monorepo. You review changes for: - Are there any TODO / FIXME / `console.log` / debug prints left in? - Does the PR title follow conventional-commits format? +## High-risk checks + +- **Executable integration path:** For new builders, transport adapters, servlet bridges, connection managers, background workers, subscriptions, or third-party SDK integrations, identify the highest-level new entry point and verify that existing tests exercise it with the real downstream lifecycle components. If tests mock the queue/thread/subscription/lifecycle owner, call out the gap. For methods named like `start`, `ensureStarted`, `close`, `subscribe`, `flush`, `shutdown`, `request`, `complete`, or `cancel`, verify actual behavior from in-repo source, cited docs, or a real integration/manual smoke test; do not infer from the method name. If a pinned pre-1.0 dependency implementation is not available in repo context, require an in-repo integration test or PR-described manual smoke proof across the real lifecycle boundary. +- **Dispatch-key mutation:** Search for `substring`, `replace`, `replaceAll`, `trim`, `toLowerCase`, normalization helpers, capping, and fallback defaults applied to routing or security keys before dispatch. Tool names, route names, skill IDs, JSON-RPC methods, task IDs, message IDs, auth principals, tenant IDs, schema paths, and cache keys should usually be rejected when invalid, not silently truncated or stripped. +- **Cross-side symmetry:** If a caller validates a value but the server accepts-and-mutates it, or one transport rejects while another normalizes, flag the mismatch. +- **Workspace artifacts:** Flag committed local coordination files such as `.wt-*`, `.context/**`, `.DS_Store`, editor swap files, temp claim files, local logs, and scratchpads unless the PR documents why they are source. + +Severity defaults: + +- Pinned or pre-1.0 third-party lifecycle method relied on without verifying the implementation -> Blocker if it can hang, drop work, or leave a dead worker. +- Public dispatch-key mutation before serving, routing, lookup, auth, cache-key construction, or persistence -> Blocker. +- Committed workspace artifacts (`.wt-claim`, `.wt-*`, `.context/**`, `.local/**`) -> Blocker unless the PR explicitly documents why they are source. + ## How to report back Structured bullet list: diff --git a/.github/ai-review/expert-adcp-reviewer.md b/.github/ai-review/expert-adcp-reviewer.md index 6501f97..a327ca8 100644 --- a/.github/ai-review/expert-adcp-reviewer.md +++ b/.github/ai-review/expert-adcp-reviewer.md @@ -148,31 +148,31 @@ Read the PR description's test plan. If a checkbox describing **manual verificat "Blocked on dev credentials" is the author's problem, not your reason to skip the check. ---- +### 4. Executable integration path audit -## Picking the action +For any PR that adds or rewires a public builder, transport adapter, servlet bridge, connection manager, task lifecycle, background worker, subscription, or third-party SDK integration, you MUST prove that the main path actually runs. Static review of the architectural story is not enough. -Three actions are available: -- `gh pr review --approve --body ""` -- `gh pr review --comment --body ""` -- `gh pr review --request-changes --body ""` +- Identify the highest-level new public entry point and the downstream component it claims to wire. Examples: `FooServerBuilder.build().onMessageSend(...)`, `TransportClient.callTool(...)`, `Servlet.doPost(...)`, `ConnectionManager.getOrConnect(...)`. +- Check whether existing tests exercise that exact composed path with real downstream components. If tests mock or stub the component that owns the lifecycle/queue/thread/subscription behavior, say so explicitly. +- For third-party lifecycle methods (`start`, `ensureStarted`, `close`, `subscribe`, `flush`, `shutdown`, `request`, `complete`, `cancel`), verify behavior from in-repo source, cited docs, or a real integration/manual smoke test. Do not trust method names. +- If the PR depends on a pinned pre-GA SDK or a version newly introduced in the PR, inspect the pinned implementation when it is available in the repo context. If it is not available, require an in-repo integration test or PR-described manual smoke proof across the real lifecycle boundary. +- When the diff touches `AdcpServerBuilder`, transport providers, servlet bridges, or A2A handler wiring, a build-only test or stub/no-op transport does not prove startup. Require one integration test or direct inspection across the real lifecycle boundary, especially for pinned or pre-1.0 SDK lifecycle methods. +- If you cannot prove the main path runs, downgrade to `--comment` or `--request-changes` depending on user impact. If you can reproduce a hang, dropped event, no-op lifecycle call, or dead background worker, that is a MUST FIX runtime bug. -**Decision tree (apply in order):** +### 5. Dispatch-key mutation audit -1. MUST FIX issue found (per the section above) → `--request-changes`. Stop. -2. PR has any of these labels → `--comment`. Append the label note. - - `do-not-auto-approve`, `wip`, `needs-human-review`, `security`, `breaking-change` -3. Otherwise, your judgment. Verdict ratio target is ~85% approve. Clean, contained change with no MUST FIX issue → `--approve`. Genuinely uncertain (open question for the author, ambiguous intent, needs context you can't verify from the diff) → `--comment` with the question — say what would flip you to approve. +For routing-sensitive or security-sensitive fields, reject-vs-mutate is a required review question. Tool names, route names, skill IDs, JSON-RPC methods, task IDs, message IDs, auth principals, tenant IDs, schema paths, and cache keys all qualify. -**Scrutiny hint:** the codegen generator, `adcp-server` signing/auth, the schema-bundle fetcher, `gradle/libs.versions.toml`, `build-logic/`, the public record surface, and workflow YAML that handles tokens warrant harder reads than docs tweaks or test additions. **But "docs" is not a synonym for "small."** A ROADMAP.md edit that adds a D-row or rewrites a track section is a governance change; open the file. The largest-file rule applies. Scrutiny is not blocking — if you read it carefully and it's clean, approve. Sensitive areas get more *scrutiny*, not more *blocking*. +- Search changed code for `substring`, `replace`, `replaceAll`, `trim`, `toLowerCase`, normalization helpers, capping, and fallback defaults applied before dispatch, lookup, auth, cache-key construction, or persistence. +- Truncating or stripping an invalid dispatch key before use is suspect. Prefer validating and rejecting invalid input, while using a separately sanitized copy only for logs and error messages. +- Silent truncation or normalization of public dispatch keys on the serving path is a MUST FIX. Tool names, route names, skill IDs, JSON-RPC methods, task/message IDs, tenant IDs, and cache keys must be rejected when invalid; only a separately sanitized copy may be used for logs or error text. +- If the caller side rejects a value but the server side truncates or normalizes it, flag the asymmetry. Cross-transport semantics must match unless the PR documents a compatibility reason. -**Notes to append (only when downgrading to `--comment`):** +### 6. Workspace artifact scan -Label hold: -``` ---- -*Held for human approval: PR has label `