Review V2: multi-reviewer worker review cockpit

@AgentWrapper @harshitsinghbhandari @neversettle17-101

## Context

PR #197 introduced V1 of AO's worker-reviewer system: a worker session can have its PR reviewed by a reviewer harness, review runs are persisted, the reviewer can submit a verdict/body, and the UI/API can attach to the reviewer pane.

V2 should build on that foundation. The goal is not to build a smarter reviewer model. The goal is to make AO a better review operations layer around worker sessions: more visible, controllable, auditable, and ergonomic than opaque review bots.

## Segment 1: V2 Worker-Reviewer Capabilities

### Multi-reviewer support per worker/commit

Support multiple configured reviewers for the same worker PR head SHA.

Example project config shape:

```yaml
review:
  mode: auto
  reviewers:
    - harness: claude-code
      required: true
    - harness: greptile
      required: false
    - harness: codex
      required: false
```

Expected behavior:

- Triggering a review creates a grouped review batch for `worker session + PR head SHA`.
- Each selected reviewer gets its own review run.
- Re-triggering the same worker/SHA should reuse existing runs unless explicitly retried.
- Storage uniqueness should move toward `session_id + target_sha + reviewer_harness + attempt`.

### Reviewer selection and fallback behavior

Keep the initial policy model simple.

Review-level mode:

- `auto`: AO may automatically trigger reviews when the worker PR reaches a reviewable state or the PR head changes.
- `manual`: AO never auto-triggers review; the user explicitly starts review from the UI/API/CLI.

Reviewer-level requirement:

- `required: true`: this reviewer is blocking for aggregate review status and future merge readiness.
- `required: false`: this reviewer is advisory; its result is visible, but it should not block the aggregate required-review gate.

Fallback behavior:

- If `review.reviewers` is configured and non-empty, use that list.
- If `review.reviewers` is not configured, keep V1 behavior: resolve the reviewer harness from the worker harness when supported by the reviewer registry.
- If the worker harness is not a supported reviewer harness, fall back to the default reviewer harness, currently `claude-code`.

This avoids confusing modes like `all`, `primary_with_fallback`, and `required_optional` for V2. Running all configured reviewers is implied by listing multiple reviewers. Required vs optional is represented per reviewer with `required`.

### Post-onboarding reviewer config edits

V2 should account for changing review config after a project is already registered/onboarded.

Open design question:

- How should users add/remove reviewer harnesses or change `required` after project creation?
- Should this be supported through existing project config editing, a dedicated CLI command, the UI, or all three?
- What happens to existing review batches when reviewer config changes? Existing historical runs should remain intact; new triggers should use the latest config.

This does not need to block the first backend schema work, but it should be considered part of the V2 product flow.

### Explicit reviewer/run states

Keep lifecycle states minimal and tied to real flows. Do not add states until there is a concrete state transition that uses them.

Initial status vocabulary:

- `launching`: AO accepted the trigger and is preparing/starting the reviewer.
- `running`: reviewer process/pane exists and review is in progress.
- `submitted`: reviewer submitted a result.
- `failed`: launch/runtime/reviewer flow failed.
- `cancelled`: user cancelled the run.
- `stale`: PR head changed after the run was created/submitted.

Do not add `queued` until AO has a real queue/backpressure system. Until then, `launching` covers ?requested but not running yet.?

Do not add `waiting_input` in the first pass unless we can reliably derive it from reviewer/runtime signals. If we can derive it for an interactive harness later, add it then.

Keep `verdict` separate from lifecycle state:

- `approved`
- `changes_requested`
- empty/none while pending

This lets AO say things like:

- "Launching Claude reviewer"
- "Reviewing commit abc123"
- "Submitted changes requested"
- "Stale: worker pushed a newer commit"
- "Failed to launch: Claude binary not found"

### CLI trigger command

Add explicit CLI support for manual review triggering:

```bash
ao review trigger <worker-session-id>
ao review trigger --session <worker-session-id>
```

In `manual` mode, this is the primary trigger path. In `auto` mode, AO automation or a worker session may use the same daemon route when review should start.

Open discussion:

- Should workers directly run `ao review trigger` when auto mode is enabled, or should the daemon trigger from observed PR/session state?
- Should initial V2 support flags like `--reviewer`, `--retry`, or keep trigger minimal first?

### Cancel and retry controls

Add review run controls:

```http
POST /api/v1/sessions/{sessionId}/reviews/{runId}/cancel
POST /api/v1/sessions/{sessionId}/reviews/{runId}/retry
```

Cancel behavior:

- Preserve history; do not delete the run.
- Mark the run `cancelled`.
- If the reviewer pane is dedicated to that run, tear it down. If the pane is shared across attempts, only mark the run cancelled.

Retry behavior:

- Do not overwrite old failed/cancelled runs.
- Create a new attempt with `attempt = previous_attempt + 1`.
- Preserve worker, harness, PR URL, and target SHA unless the PR head changed.

### Aggregate review summary

Expose one top-level summary per worker/SHA review batch so the UI can answer "what is the review outcome?" without making the user inspect every run.

The aggregate should be derived in service/API logic from review data, not separately stored. Inputs should include:

- `review_run.status`
- `review_run.verdict`
- reviewer `required` flag
- current PR head SHA vs run target SHA

Possible aggregate states:

- `not_started`
- `in_progress`
- `approved`
- `changes_requested`
- `failed`
- `cancelled`
- `stale`
- `mixed`

Example rules:

- Any required reviewer with `changes_requested` -> aggregate `changes_requested`.
- All required reviewers approved -> aggregate `approved`.
- Required reviewer still running/launching -> aggregate `in_progress`.
- Required reviewer failed with no successful retry -> aggregate `failed`.
- Optional reviewers disagree while required reviewers pass -> aggregate may be `mixed` or `approved_with_notes`.
- PR head changes after a batch -> aggregate `stale`.

Reviewer prompts should produce the verdict/body, but aggregation should remain deterministic service logic so the UI gets consistent results.

### Per-reviewer terminal handles

Per-reviewer terminal handles are part of the trigger/list/review cockpit flow.

Move from one review-level `reviewerHandleId` to per-run handles. When a run has an interactive reviewer, expose its `reviewerHandleId`; one-shot reviewers can return an empty handle.

Flow:

1. User/API/automation triggers review.
2. AO creates or reuses reviewer runs.
3. Each run response/list response includes its own `reviewerHandleId` when applicable.
4. UI shows an attach action only for runs with a handle.

Example API response direction:

```json
{
  "summary": {
    "status": "changes_requested",
    "requiredApproved": 1,
    "requiredTotal": 2
  },
  "groups": [
    {
      "targetSha": "abc123",
      "status": "changes_requested",
      "runs": [
        {
          "id": "run-1",
          "harness": "claude-code",
          "required": true,
          "status": "running",
          "verdict": "",
          "reviewerHandleId": "review-mer-4-claude-code"
        },
        {
          "id": "run-2",
          "harness": "greptile",
          "required": false,
          "status": "submitted",
          "verdict": "approved",
          "reviewerHandleId": ""
        }
      ]
    }
  ]
}
```

### Read-only reviewer behavior

Reviewers should be read-only/restrictive by default, not optionally read-only through config.

Default behavior:

- Reviewer sessions should inspect and comment, not mutate worker checkouts.
- Do not inherit worker `bypass-permissions` into reviewer sessions.
- Launch reviewer harnesses with the most restrictive available permission mode.
- Make the review prompt explicit: do not modify files, do not commit, inspect and submit review only.

Reviewer config should be used for reviewer-specific settings, not to opt into read-only behavior:

```yaml
review:
  mode: auto
  reviewers:
    - harness: claude-code
      required: true
      config:
        model: claude-opus-4-5
```

Later, we can discuss exceptional overrides or stronger isolation such as read-only worktree copies/runtime restrictions.

## Segment 2: Product Direction - Better UX Than Greptile

The goal is not to outsmart Greptile. The goal is to make AO feel better for agent-work review because AO owns the worker sessions, PR facts, terminals, lifecycle, and follow-up loop.

### Review cockpit per worker

For every worker session, expose a dedicated review panel showing:

- Current PR URL and head SHA.
- Current review batch status.
- Assigned reviewers.
- Each reviewer lifecycle state and verdict.
- Attach buttons for interactive reviewer terminals.
- Retry/cancel controls.
- Whether the review is stale because the worker pushed new commits.

The user should not need to jump between GitHub, terminal panes, logs, and the session board to understand review status.

### State transparency as a differentiator

AO should make review progress legible with only the states we can actually support:

- Launching reviewer
- Reviewing commit
- Submitted verdict
- Failed with reason
- Cancelled
- Stale after new worker commit

Opaque bots leave users wondering whether anything is happening. AO should make that impossible.

### Review batches as the user-facing unit

Group reviewer runs by `worker session + target SHA`.

Example UI concept:

```text
Review batch for mer-4 @ abc123
- Claude Code: changes requested
- Greptile: approved
- Codex: running
Overall: changes requested
```

This maps to how users think: "review the worker current PR state."

### Actionable aggregate outcome

The UI should answer:

- Can this worker merge?
- Is re-review needed?
- Which reviewer is blocking?
- Is the blocker actual code feedback or reviewer infrastructure failure?
- Did the worker push after the review?

Example summary:

```text
Changes requested by Claude Code.
Greptile approved.
Worker pushed 2 commits after this batch: re-review recommended.
```

### Follow-up loop

After a reviewer requests changes, AO should offer direct next actions:

- Send review feedback to worker.
- Ask worker to fix.
- Retry reviewer.
- Cancel stuck reviewer.
- Trigger re-review after new commits.
- Mark optional reviewer result as acknowledged.

This is where AO can outperform plain PR bots: it controls the worker-reviewer workflow, not just the PR comment.

### Reviewer terminal as a first-class feature

For interactive reviewers, terminal access is a major UX advantage:

- Attach to reviewer.
- See what it is doing.
- Nudge or recover it.
- Handle auth/input prompts.
- Inspect failed launches.

This should be visible per reviewer, not hidden behind a single review-level handle.

### Audit trail grouped by commit

Keep review history grouped by commit/SHA:

```text
abc123
- Claude Code: changes requested
- Greptile: approved

def456
- Claude Code: approved
- Greptile: not run
```

This helps users understand whether feedback was addressed and whether the current PR head has actually been reviewed.

## Proposed V2 Implementation Order

1. Add review config shape with review-level `mode: auto | manual` and reviewer-level `required`.
2. Add post-onboarding config update support/design for reviewer settings.
3. Add `ao review trigger` and daemon/API trigger behavior for manual mode.
4. Add review batch grouping by worker + PR head SHA.
5. Add multiple reviewer runs per batch.
6. Add minimal per-run lifecycle state and per-run terminal handle.
7. Add derived aggregate review summary based on required/advisory reviewers.
8. Add retry/cancel endpoints.
9. Enforce read-only/restrictive reviewer behavior by default.
10. Add UI review cockpit for worker sessions.
11. Add "send review feedback to worker" follow-up action.

## Acceptance Criteria

- A project can configure more than one reviewer.
- Review trigger behavior supports `auto` and `manual` modes at the review config level.
- Each reviewer config can be marked `required: true` or `required: false`.
- There is a documented or implemented path to update reviewer config after project onboarding.
- If no reviewer config is present, V1 fallback behavior is preserved: worker harness when supported, otherwise default reviewer harness.
- `ao review trigger <worker-session-id>` exists and triggers review through the daemon.
- Triggering review for a worker creates/reuses one run per selected reviewer.
- The API exposes per-reviewer run state, verdict, target SHA, attempt, required flag, and terminal handle.
- The API derives and exposes an aggregate review summary for the worker current PR head.
- Users can retry or cancel a review run.
- Reviewer runs become visibly stale when the PR head changes.
- Reviewer sessions default to read-only/restrictive behavior where supported.
- The UI shows a worker-level review cockpit with batch status, reviewers, controls, and terminal attach actions.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review V2: multi-reviewer worker review cockpit #236

Context

Segment 1: V2 Worker-Reviewer Capabilities

Multi-reviewer support per worker/commit

Reviewer selection and fallback behavior

Post-onboarding reviewer config edits

Explicit reviewer/run states

CLI trigger command

Cancel and retry controls

Aggregate review summary

Per-reviewer terminal handles

Read-only reviewer behavior

Segment 2: Product Direction - Better UX Than Greptile

Review cockpit per worker

State transparency as a differentiator

Review batches as the user-facing unit

Actionable aggregate outcome

Follow-up loop

Reviewer terminal as a first-class feature

Audit trail grouped by commit

Proposed V2 Implementation Order

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Review V2: multi-reviewer worker review cockpit #236

Description

Context

Segment 1: V2 Worker-Reviewer Capabilities

Multi-reviewer support per worker/commit

Reviewer selection and fallback behavior

Post-onboarding reviewer config edits

Explicit reviewer/run states

CLI trigger command

Cancel and retry controls

Aggregate review summary

Per-reviewer terminal handles

Read-only reviewer behavior

Segment 2: Product Direction - Better UX Than Greptile

Review cockpit per worker

State transparency as a differentiator

Review batches as the user-facing unit

Actionable aggregate outcome

Follow-up loop

Reviewer terminal as a first-class feature

Audit trail grouped by commit

Proposed V2 Implementation Order

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions