Skip to content

Review V2: multi-reviewer worker review cockpit #236

@Vaibhaav-Tiwari

Description

@Vaibhaav-Tiwari

@AgentWrapper @harshitsinghbhandari @neversettle17-101

Context

PR #197 introduced V1 of AO's worker-reviewer system: a worker session can have its PR reviewed by a reviewer harness, review runs are persisted, the reviewer can submit a verdict/body, and the UI/API can attach to the reviewer pane.

V2 should build on that foundation. The goal is not to build a smarter reviewer model. The goal is to make AO a better review operations layer around worker sessions: more visible, controllable, auditable, and ergonomic than opaque review bots.

Segment 1: V2 Worker-Reviewer Capabilities

Multi-reviewer support per worker/commit

Support multiple configured reviewers for the same worker PR head SHA.

Example project config shape:

review:
  mode: auto
  reviewers:
    - harness: claude-code
      required: true
    - harness: greptile
      required: false
    - harness: codex
      required: false

Expected behavior:

  • Triggering a review creates a grouped review batch for worker session + PR head SHA.
  • Each selected reviewer gets its own review run.
  • Re-triggering the same worker/SHA should reuse existing runs unless explicitly retried.
  • Storage uniqueness should move toward session_id + target_sha + reviewer_harness + attempt.

Reviewer selection and fallback behavior

Keep the initial policy model simple.

Review-level mode:

  • auto: AO may automatically trigger reviews when the worker PR reaches a reviewable state or the PR head changes.
  • manual: AO never auto-triggers review; the user explicitly starts review from the UI/API/CLI.

Reviewer-level requirement:

  • required: true: this reviewer is blocking for aggregate review status and future merge readiness.
  • required: false: this reviewer is advisory; its result is visible, but it should not block the aggregate required-review gate.

Fallback behavior:

  • If review.reviewers is configured and non-empty, use that list.
  • If review.reviewers is not configured, keep V1 behavior: resolve the reviewer harness from the worker harness when supported by the reviewer registry.
  • If the worker harness is not a supported reviewer harness, fall back to the default reviewer harness, currently claude-code.

This avoids confusing modes like all, primary_with_fallback, and required_optional for V2. Running all configured reviewers is implied by listing multiple reviewers. Required vs optional is represented per reviewer with required.

Post-onboarding reviewer config edits

V2 should account for changing review config after a project is already registered/onboarded.

Open design question:

  • How should users add/remove reviewer harnesses or change required after project creation?
  • Should this be supported through existing project config editing, a dedicated CLI command, the UI, or all three?
  • What happens to existing review batches when reviewer config changes? Existing historical runs should remain intact; new triggers should use the latest config.

This does not need to block the first backend schema work, but it should be considered part of the V2 product flow.

Explicit reviewer/run states

Keep lifecycle states minimal and tied to real flows. Do not add states until there is a concrete state transition that uses them.

Initial status vocabulary:

  • launching: AO accepted the trigger and is preparing/starting the reviewer.
  • running: reviewer process/pane exists and review is in progress.
  • submitted: reviewer submitted a result.
  • failed: launch/runtime/reviewer flow failed.
  • cancelled: user cancelled the run.
  • stale: PR head changed after the run was created/submitted.

Do not add queued until AO has a real queue/backpressure system. Until then, launching covers ?requested but not running yet.?

Do not add waiting_input in the first pass unless we can reliably derive it from reviewer/runtime signals. If we can derive it for an interactive harness later, add it then.

Keep verdict separate from lifecycle state:

  • approved
  • changes_requested
  • empty/none while pending

This lets AO say things like:

  • "Launching Claude reviewer"
  • "Reviewing commit abc123"
  • "Submitted changes requested"
  • "Stale: worker pushed a newer commit"
  • "Failed to launch: Claude binary not found"

CLI trigger command

Add explicit CLI support for manual review triggering:

ao review trigger <worker-session-id>
ao review trigger --session <worker-session-id>

In manual mode, this is the primary trigger path. In auto mode, AO automation or a worker session may use the same daemon route when review should start.

Open discussion:

  • Should workers directly run ao review trigger when auto mode is enabled, or should the daemon trigger from observed PR/session state?
  • Should initial V2 support flags like --reviewer, --retry, or keep trigger minimal first?

Cancel and retry controls

Add review run controls:

POST /api/v1/sessions/{sessionId}/reviews/{runId}/cancel
POST /api/v1/sessions/{sessionId}/reviews/{runId}/retry

Cancel behavior:

  • Preserve history; do not delete the run.
  • Mark the run cancelled.
  • If the reviewer pane is dedicated to that run, tear it down. If the pane is shared across attempts, only mark the run cancelled.

Retry behavior:

  • Do not overwrite old failed/cancelled runs.
  • Create a new attempt with attempt = previous_attempt + 1.
  • Preserve worker, harness, PR URL, and target SHA unless the PR head changed.

Aggregate review summary

Expose one top-level summary per worker/SHA review batch so the UI can answer "what is the review outcome?" without making the user inspect every run.

The aggregate should be derived in service/API logic from review data, not separately stored. Inputs should include:

  • review_run.status
  • review_run.verdict
  • reviewer required flag
  • current PR head SHA vs run target SHA

Possible aggregate states:

  • not_started
  • in_progress
  • approved
  • changes_requested
  • failed
  • cancelled
  • stale
  • mixed

Example rules:

  • Any required reviewer with changes_requested -> aggregate changes_requested.
  • All required reviewers approved -> aggregate approved.
  • Required reviewer still running/launching -> aggregate in_progress.
  • Required reviewer failed with no successful retry -> aggregate failed.
  • Optional reviewers disagree while required reviewers pass -> aggregate may be mixed or approved_with_notes.
  • PR head changes after a batch -> aggregate stale.

Reviewer prompts should produce the verdict/body, but aggregation should remain deterministic service logic so the UI gets consistent results.

Per-reviewer terminal handles

Per-reviewer terminal handles are part of the trigger/list/review cockpit flow.

Move from one review-level reviewerHandleId to per-run handles. When a run has an interactive reviewer, expose its reviewerHandleId; one-shot reviewers can return an empty handle.

Flow:

  1. User/API/automation triggers review.
  2. AO creates or reuses reviewer runs.
  3. Each run response/list response includes its own reviewerHandleId when applicable.
  4. UI shows an attach action only for runs with a handle.

Example API response direction:

{
  "summary": {
    "status": "changes_requested",
    "requiredApproved": 1,
    "requiredTotal": 2
  },
  "groups": [
    {
      "targetSha": "abc123",
      "status": "changes_requested",
      "runs": [
        {
          "id": "run-1",
          "harness": "claude-code",
          "required": true,
          "status": "running",
          "verdict": "",
          "reviewerHandleId": "review-mer-4-claude-code"
        },
        {
          "id": "run-2",
          "harness": "greptile",
          "required": false,
          "status": "submitted",
          "verdict": "approved",
          "reviewerHandleId": ""
        }
      ]
    }
  ]
}

Read-only reviewer behavior

Reviewers should be read-only/restrictive by default, not optionally read-only through config.

Default behavior:

  • Reviewer sessions should inspect and comment, not mutate worker checkouts.
  • Do not inherit worker bypass-permissions into reviewer sessions.
  • Launch reviewer harnesses with the most restrictive available permission mode.
  • Make the review prompt explicit: do not modify files, do not commit, inspect and submit review only.

Reviewer config should be used for reviewer-specific settings, not to opt into read-only behavior:

review:
  mode: auto
  reviewers:
    - harness: claude-code
      required: true
      config:
        model: claude-opus-4-5

Later, we can discuss exceptional overrides or stronger isolation such as read-only worktree copies/runtime restrictions.

Segment 2: Product Direction - Better UX Than Greptile

The goal is not to outsmart Greptile. The goal is to make AO feel better for agent-work review because AO owns the worker sessions, PR facts, terminals, lifecycle, and follow-up loop.

Review cockpit per worker

For every worker session, expose a dedicated review panel showing:

  • Current PR URL and head SHA.
  • Current review batch status.
  • Assigned reviewers.
  • Each reviewer lifecycle state and verdict.
  • Attach buttons for interactive reviewer terminals.
  • Retry/cancel controls.
  • Whether the review is stale because the worker pushed new commits.

The user should not need to jump between GitHub, terminal panes, logs, and the session board to understand review status.

State transparency as a differentiator

AO should make review progress legible with only the states we can actually support:

  • Launching reviewer
  • Reviewing commit
  • Submitted verdict
  • Failed with reason
  • Cancelled
  • Stale after new worker commit

Opaque bots leave users wondering whether anything is happening. AO should make that impossible.

Review batches as the user-facing unit

Group reviewer runs by worker session + target SHA.

Example UI concept:

Review batch for mer-4 @ abc123
- Claude Code: changes requested
- Greptile: approved
- Codex: running
Overall: changes requested

This maps to how users think: "review the worker current PR state."

Actionable aggregate outcome

The UI should answer:

  • Can this worker merge?
  • Is re-review needed?
  • Which reviewer is blocking?
  • Is the blocker actual code feedback or reviewer infrastructure failure?
  • Did the worker push after the review?

Example summary:

Changes requested by Claude Code.
Greptile approved.
Worker pushed 2 commits after this batch: re-review recommended.

Follow-up loop

After a reviewer requests changes, AO should offer direct next actions:

  • Send review feedback to worker.
  • Ask worker to fix.
  • Retry reviewer.
  • Cancel stuck reviewer.
  • Trigger re-review after new commits.
  • Mark optional reviewer result as acknowledged.

This is where AO can outperform plain PR bots: it controls the worker-reviewer workflow, not just the PR comment.

Reviewer terminal as a first-class feature

For interactive reviewers, terminal access is a major UX advantage:

  • Attach to reviewer.
  • See what it is doing.
  • Nudge or recover it.
  • Handle auth/input prompts.
  • Inspect failed launches.

This should be visible per reviewer, not hidden behind a single review-level handle.

Audit trail grouped by commit

Keep review history grouped by commit/SHA:

abc123
- Claude Code: changes requested
- Greptile: approved

def456
- Claude Code: approved
- Greptile: not run

This helps users understand whether feedback was addressed and whether the current PR head has actually been reviewed.

Proposed V2 Implementation Order

  1. Add review config shape with review-level mode: auto | manual and reviewer-level required.
  2. Add post-onboarding config update support/design for reviewer settings.
  3. Add ao review trigger and daemon/API trigger behavior for manual mode.
  4. Add review batch grouping by worker + PR head SHA.
  5. Add multiple reviewer runs per batch.
  6. Add minimal per-run lifecycle state and per-run terminal handle.
  7. Add derived aggregate review summary based on required/advisory reviewers.
  8. Add retry/cancel endpoints.
  9. Enforce read-only/restrictive reviewer behavior by default.
  10. Add UI review cockpit for worker sessions.
  11. Add "send review feedback to worker" follow-up action.

Acceptance Criteria

  • A project can configure more than one reviewer.
  • Review trigger behavior supports auto and manual modes at the review config level.
  • Each reviewer config can be marked required: true or required: false.
  • There is a documented or implemented path to update reviewer config after project onboarding.
  • If no reviewer config is present, V1 fallback behavior is preserved: worker harness when supported, otherwise default reviewer harness.
  • ao review trigger <worker-session-id> exists and triggers review through the daemon.
  • Triggering review for a worker creates/reuses one run per selected reviewer.
  • The API exposes per-reviewer run state, verdict, target SHA, attempt, required flag, and terminal handle.
  • The API derives and exposes an aggregate review summary for the worker current PR head.
  • Users can retry or cancel a review run.
  • Reviewer runs become visibly stale when the PR head changes.
  • Reviewer sessions default to read-only/restrictive behavior where supported.
  • The UI shows a worker-level review cockpit with batch status, reviewers, controls, and terminal attach actions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    futureA feature to be worked on in the future.needs-reviewAuthor signals ready for review

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions