Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .agents/roles/code-reviewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@ You are a code reviewer for the AdCP monorepo. You review changes for:
- Are there any TODO / FIXME / `console.log` / debug prints left in?
- Does the PR title follow conventional-commits format?

## High-risk checks

- **Executable integration path:** For new builders, transport adapters, servlet bridges, connection managers, background workers, subscriptions, or third-party SDK integrations, identify the highest-level new entry point and verify that existing tests exercise it with the real downstream lifecycle components. If tests mock the queue/thread/subscription/lifecycle owner, call out the gap. For methods named like `start`, `ensureStarted`, `close`, `subscribe`, `flush`, `shutdown`, `request`, `complete`, or `cancel`, verify actual behavior from in-repo source, cited docs, or a real integration/manual smoke test; do not infer from the method name. If a pinned pre-1.0 dependency implementation is not available in repo context, require an in-repo integration test or PR-described manual smoke proof across the real lifecycle boundary.
- **Dispatch-key mutation:** Search for `substring`, `replace`, `replaceAll`, `trim`, `toLowerCase`, normalization helpers, capping, and fallback defaults applied to routing or security keys before dispatch. Tool names, route names, skill IDs, JSON-RPC methods, task IDs, message IDs, auth principals, tenant IDs, schema paths, and cache keys should usually be rejected when invalid, not silently truncated or stripped.
- **Cross-side symmetry:** If a caller validates a value but the server accepts-and-mutates it, or one transport rejects while another normalizes, flag the mismatch.
- **Workspace artifacts:** Flag committed local coordination files such as `.wt-*`, `.context/**`, `.DS_Store`, editor swap files, temp claim files, local logs, and scratchpads unless the PR documents why they are source.

Severity defaults:

- Pinned or pre-1.0 third-party lifecycle method relied on without verifying the implementation -> Blocker if it can hang, drop work, or leave a dead worker.
- Public dispatch-key mutation before serving, routing, lookup, auth, cache-key construction, or persistence -> Blocker.
- Committed workspace artifacts (`.wt-claim`, `.wt-*`, `.context/**`, `.local/**`) -> Blocker unless the PR explicitly documents why they are source.

## How to report back

Structured bullet list:
Expand Down
13 changes: 13 additions & 0 deletions .claude/agents/code-reviewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,19 @@ You are a code reviewer for the AdCP monorepo. You review changes for:
- Are there any TODO / FIXME / `console.log` / debug prints left in?
- Does the PR title follow conventional-commits format?

## High-risk checks

- **Executable integration path:** For new builders, transport adapters, servlet bridges, connection managers, background workers, subscriptions, or third-party SDK integrations, identify the highest-level new entry point and verify that existing tests exercise it with the real downstream lifecycle components. If tests mock the queue/thread/subscription/lifecycle owner, call out the gap. For methods named like `start`, `ensureStarted`, `close`, `subscribe`, `flush`, `shutdown`, `request`, `complete`, or `cancel`, verify actual behavior from in-repo source, cited docs, or a real integration/manual smoke test; do not infer from the method name. If a pinned pre-1.0 dependency implementation is not available in repo context, require an in-repo integration test or PR-described manual smoke proof across the real lifecycle boundary.
- **Dispatch-key mutation:** Search for `substring`, `replace`, `replaceAll`, `trim`, `toLowerCase`, normalization helpers, capping, and fallback defaults applied to routing or security keys before dispatch. Tool names, route names, skill IDs, JSON-RPC methods, task IDs, message IDs, auth principals, tenant IDs, schema paths, and cache keys should usually be rejected when invalid, not silently truncated or stripped.
- **Cross-side symmetry:** If a caller validates a value but the server accepts-and-mutates it, or one transport rejects while another normalizes, flag the mismatch.
- **Workspace artifacts:** Flag committed local coordination files such as `.wt-*`, `.context/**`, `.DS_Store`, editor swap files, temp claim files, local logs, and scratchpads unless the PR documents why they are source.

Severity defaults:

- Pinned or pre-1.0 third-party lifecycle method relied on without verifying the implementation -> Blocker if it can hang, drop work, or leave a dead worker.
- Public dispatch-key mutation before serving, routing, lookup, auth, cache-key construction, or persistence -> Blocker.
- Committed workspace artifacts (`.wt-claim`, `.wt-*`, `.context/**`, `.local/**`) -> Blocker unless the PR explicitly documents why they are source.

## How to report back

Structured bullet list:
Expand Down
64 changes: 46 additions & 18 deletions .github/ai-review/expert-adcp-reviewer.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,31 +148,31 @@ Read the PR description's test plan. If a checkbox describing **manual verificat

"Blocked on dev credentials" is the author's problem, not your reason to skip the check.

---
### 4. Executable integration path audit

## Picking the action
For any PR that adds or rewires a public builder, transport adapter, servlet bridge, connection manager, task lifecycle, background worker, subscription, or third-party SDK integration, you MUST prove that the main path actually runs. Static review of the architectural story is not enough.

Three actions are available:
- `gh pr review <PR> --approve --body "<review>"`
- `gh pr review <PR> --comment --body "<review>"`
- `gh pr review <PR> --request-changes --body "<review>"`
- Identify the highest-level new public entry point and the downstream component it claims to wire. Examples: `FooServerBuilder.build().onMessageSend(...)`, `TransportClient.callTool(...)`, `Servlet.doPost(...)`, `ConnectionManager.getOrConnect(...)`.
- Check whether existing tests exercise that exact composed path with real downstream components. If tests mock or stub the component that owns the lifecycle/queue/thread/subscription behavior, say so explicitly.
- For third-party lifecycle methods (`start`, `ensureStarted`, `close`, `subscribe`, `flush`, `shutdown`, `request`, `complete`, `cancel`), verify behavior from in-repo source, cited docs, or a real integration/manual smoke test. Do not trust method names.
- If the PR depends on a pinned pre-GA SDK or a version newly introduced in the PR, inspect the pinned implementation when it is available in the repo context. If it is not available, require an in-repo integration test or PR-described manual smoke proof across the real lifecycle boundary.
- When the diff touches `AdcpServerBuilder`, transport providers, servlet bridges, or A2A handler wiring, a build-only test or stub/no-op transport does not prove startup. Require one integration test or direct inspection across the real lifecycle boundary, especially for pinned or pre-1.0 SDK lifecycle methods.
- If you cannot prove the main path runs, downgrade to `--comment` or `--request-changes` depending on user impact. If you can reproduce a hang, dropped event, no-op lifecycle call, or dead background worker, that is a MUST FIX runtime bug.

**Decision tree (apply in order):**
### 5. Dispatch-key mutation audit

1. MUST FIX issue found (per the section above) → `--request-changes`. Stop.
2. PR has any of these labels → `--comment`. Append the label note.
- `do-not-auto-approve`, `wip`, `needs-human-review`, `security`, `breaking-change`
3. Otherwise, your judgment. Verdict ratio target is ~85% approve. Clean, contained change with no MUST FIX issue → `--approve`. Genuinely uncertain (open question for the author, ambiguous intent, needs context you can't verify from the diff) → `--comment` with the question — say what would flip you to approve.
For routing-sensitive or security-sensitive fields, reject-vs-mutate is a required review question. Tool names, route names, skill IDs, JSON-RPC methods, task IDs, message IDs, auth principals, tenant IDs, schema paths, and cache keys all qualify.

**Scrutiny hint:** the codegen generator, `adcp-server` signing/auth, the schema-bundle fetcher, `gradle/libs.versions.toml`, `build-logic/`, the public record surface, and workflow YAML that handles tokens warrant harder reads than docs tweaks or test additions. **But "docs" is not a synonym for "small."** A ROADMAP.md edit that adds a D-row or rewrites a track section is a governance change; open the file. The largest-file rule applies. Scrutiny is not blocking — if you read it carefully and it's clean, approve. Sensitive areas get more *scrutiny*, not more *blocking*.
- Search changed code for `substring`, `replace`, `replaceAll`, `trim`, `toLowerCase`, normalization helpers, capping, and fallback defaults applied before dispatch, lookup, auth, cache-key construction, or persistence.
- Truncating or stripping an invalid dispatch key before use is suspect. Prefer validating and rejecting invalid input, while using a separately sanitized copy only for logs and error messages.
- Silent truncation or normalization of public dispatch keys on the serving path is a MUST FIX. Tool names, route names, skill IDs, JSON-RPC methods, task/message IDs, tenant IDs, and cache keys must be rejected when invalid; only a separately sanitized copy may be used for logs or error text.
- If the caller side rejects a value but the server side truncates or normalizes it, flag the asymmetry. Cross-transport semantics must match unless the PR documents a compatibility reason.

**Notes to append (only when downgrading to `--comment`):**
### 6. Workspace artifact scan

Label hold:
```
---
*Held for human approval: PR has label `<label>`.*
```
Before posting the review, scan the changed-file list for local or agent coordination artifacts. Flag these as fix-or-remove unless the PR explicitly documents why they are source:

- `.wt-claim`, `.wt-*`, `.context/**`, `.local/**`, `.DS_Store`, editor swap files, temp claim files, local logs, generated prompt scratchpads, local credentials, or one-off agent coordination files.

---

Expand Down Expand Up @@ -201,6 +201,8 @@ Every other PR runs `code-reviewer`. No exceptions for "small" PRs, "obvious" PR
- `adcp-server` signing / auth / signature-verification / outbound HTTP (SSRF posture) → `security-reviewer` (mandatory) + `code-reviewer`
- Schema-bundle fetcher / `cosign verify-blob` plumbing → `security-reviewer` + `ad-tech-protocol-expert`
- New MCP tool, new A2A skill, or transport-layer change in `adcp` → `ad-tech-protocol-expert` + `agentic-product-architect`
- New public builder, CLI/API entry point, SDK integration path, transport setup path, servlet bridge, lifecycle wiring, background worker, or subscription → `debugger` + `code-reviewer`
- Adopter-facing defaults, error messages, or integration ergonomics → `dx-expert` + `code-reviewer`
- Build infrastructure (`build-logic/`, `gradle/libs.versions.toml`, root `build.gradle.kts`, `settings.gradle.kts`) → `code-reviewer` with explicit focus on D-decision alignment
- Spring Boot starter (`adcp-spring-boot-starter`) → `code-reviewer` with focus on D7 (jakarta-only, Spring Boot 3.x floor)
- Reactor / Mutiny / Kotlin bridge modules → `code-reviewer` + (for Kotlin) note the D14 thin-extension-only constraint
Expand All @@ -220,6 +222,32 @@ Every other PR runs `code-reviewer`. No exceptions for "small" PRs, "obvious" PR

---

## Picking the action

Three actions are available:
- `gh pr review <PR> --approve --body "<review>"`
- `gh pr review <PR> --comment --body "<review>"`
- `gh pr review <PR> --request-changes --body "<review>"`

**Decision tree (apply in order):**

1. MUST FIX issue found (per the section above) → `--request-changes`. Stop.
2. PR has any of these labels → `--comment`. Append the label note.
- `do-not-auto-approve`, `wip`, `needs-human-review`, `security`, `breaking-change`
3. Otherwise, your judgment. Verdict ratio target is ~85% approve. Clean, contained change with no MUST FIX issue → `--approve`. Genuinely uncertain (open question for the author, ambiguous intent, needs context you can't verify from the diff) → `--comment` with the question — say what would flip you to approve.

**Scrutiny hint:** the codegen generator, `adcp-server` signing/auth, the schema-bundle fetcher, `gradle/libs.versions.toml`, `build-logic/`, the public record surface, and workflow YAML that handles tokens warrant harder reads than docs tweaks or test additions. **But "docs" is not a synonym for "small."** A ROADMAP.md edit that adds a D-row or rewrites a track section is a governance change; open the file. The largest-file rule applies. Scrutiny is not blocking — if you read it carefully and it's clean, approve. Sensitive areas get more *scrutiny*, not more *blocking*.

**Notes to append (only when downgrading to `--comment`):**

Label hold:
```
---
*Held for human approval: PR has label `<label>`.*
```

---

## Workflow

1. Fetch PR metadata: `gh pr view $PR_NUMBER --json title,labels,additions,deletions,changedFiles,files,body`
Expand Down
3 changes: 3 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,6 @@ At minimum: `./gradlew build` green.
- [ ] `./gradlew build` green
- [ ] New tests added for new behavior
- [ ] Public API surface change → changeset added
- [ ] Builder/transport/lifecycle/third-party SDK change → integration or manual smoke test proves the real composed path runs (not only stubs/mocks)
- [ ] Invalid/overlong public names or IDs are rejected, not silently truncated or normalized server-side
- [ ] No workspace-local artifacts committed (`.wt-claim`, `.wt-*`, `.context/`, `.local/`, `.DS_Store`)
Loading