@AgentWrapper @harshitsinghbhandari @neversettle17-101
Context
PR #197 introduced V1 of AO's worker-reviewer system: a worker session can have its PR reviewed by a reviewer harness, review runs are persisted, the reviewer can submit a verdict/body, and the UI/API can attach to the reviewer pane.
V2 should build on that foundation. The goal is not to build a smarter reviewer model. The goal is to make AO a better review operations layer around worker sessions: more visible, controllable, auditable, and ergonomic than opaque review bots.
Segment 1: V2 Worker-Reviewer Capabilities
Multi-reviewer support per worker/commit
Support multiple configured reviewers for the same worker PR head SHA.
Example project config shape:
review:
mode: auto
reviewers:
- harness: claude-code
required: true
- harness: greptile
required: false
- harness: codex
required: false
Expected behavior:
- Triggering a review creates a grouped review batch for
worker session + PR head SHA.
- Each selected reviewer gets its own review run.
- Re-triggering the same worker/SHA should reuse existing runs unless explicitly retried.
- Storage uniqueness should move toward
session_id + target_sha + reviewer_harness + attempt.
Reviewer selection and fallback behavior
Keep the initial policy model simple.
Review-level mode:
auto: AO may automatically trigger reviews when the worker PR reaches a reviewable state or the PR head changes.
manual: AO never auto-triggers review; the user explicitly starts review from the UI/API/CLI.
Reviewer-level requirement:
required: true: this reviewer is blocking for aggregate review status and future merge readiness.
required: false: this reviewer is advisory; its result is visible, but it should not block the aggregate required-review gate.
Fallback behavior:
- If
review.reviewers is configured and non-empty, use that list.
- If
review.reviewers is not configured, keep V1 behavior: resolve the reviewer harness from the worker harness when supported by the reviewer registry.
- If the worker harness is not a supported reviewer harness, fall back to the default reviewer harness, currently
claude-code.
This avoids confusing modes like all, primary_with_fallback, and required_optional for V2. Running all configured reviewers is implied by listing multiple reviewers. Required vs optional is represented per reviewer with required.
Post-onboarding reviewer config edits
V2 should account for changing review config after a project is already registered/onboarded.
Open design question:
- How should users add/remove reviewer harnesses or change
required after project creation?
- Should this be supported through existing project config editing, a dedicated CLI command, the UI, or all three?
- What happens to existing review batches when reviewer config changes? Existing historical runs should remain intact; new triggers should use the latest config.
This does not need to block the first backend schema work, but it should be considered part of the V2 product flow.
Explicit reviewer/run states
Keep lifecycle states minimal and tied to real flows. Do not add states until there is a concrete state transition that uses them.
Initial status vocabulary:
launching: AO accepted the trigger and is preparing/starting the reviewer.
running: reviewer process/pane exists and review is in progress.
submitted: reviewer submitted a result.
failed: launch/runtime/reviewer flow failed.
cancelled: user cancelled the run.
stale: PR head changed after the run was created/submitted.
Do not add queued until AO has a real queue/backpressure system. Until then, launching covers ?requested but not running yet.?
Do not add waiting_input in the first pass unless we can reliably derive it from reviewer/runtime signals. If we can derive it for an interactive harness later, add it then.
Keep verdict separate from lifecycle state:
approved
changes_requested
- empty/none while pending
This lets AO say things like:
- "Launching Claude reviewer"
- "Reviewing commit abc123"
- "Submitted changes requested"
- "Stale: worker pushed a newer commit"
- "Failed to launch: Claude binary not found"
CLI trigger command
Add explicit CLI support for manual review triggering:
ao review trigger <worker-session-id>
ao review trigger --session <worker-session-id>
In manual mode, this is the primary trigger path. In auto mode, AO automation or a worker session may use the same daemon route when review should start.
Open discussion:
- Should workers directly run
ao review trigger when auto mode is enabled, or should the daemon trigger from observed PR/session state?
- Should initial V2 support flags like
--reviewer, --retry, or keep trigger minimal first?
Cancel and retry controls
Add review run controls:
POST /api/v1/sessions/{sessionId}/reviews/{runId}/cancel
POST /api/v1/sessions/{sessionId}/reviews/{runId}/retry
Cancel behavior:
- Preserve history; do not delete the run.
- Mark the run
cancelled.
- If the reviewer pane is dedicated to that run, tear it down. If the pane is shared across attempts, only mark the run cancelled.
Retry behavior:
- Do not overwrite old failed/cancelled runs.
- Create a new attempt with
attempt = previous_attempt + 1.
- Preserve worker, harness, PR URL, and target SHA unless the PR head changed.
Aggregate review summary
Expose one top-level summary per worker/SHA review batch so the UI can answer "what is the review outcome?" without making the user inspect every run.
The aggregate should be derived in service/API logic from review data, not separately stored. Inputs should include:
review_run.status
review_run.verdict
- reviewer
required flag
- current PR head SHA vs run target SHA
Possible aggregate states:
not_started
in_progress
approved
changes_requested
failed
cancelled
stale
mixed
Example rules:
- Any required reviewer with
changes_requested -> aggregate changes_requested.
- All required reviewers approved -> aggregate
approved.
- Required reviewer still running/launching -> aggregate
in_progress.
- Required reviewer failed with no successful retry -> aggregate
failed.
- Optional reviewers disagree while required reviewers pass -> aggregate may be
mixed or approved_with_notes.
- PR head changes after a batch -> aggregate
stale.
Reviewer prompts should produce the verdict/body, but aggregation should remain deterministic service logic so the UI gets consistent results.
Per-reviewer terminal handles
Per-reviewer terminal handles are part of the trigger/list/review cockpit flow.
Move from one review-level reviewerHandleId to per-run handles. When a run has an interactive reviewer, expose its reviewerHandleId; one-shot reviewers can return an empty handle.
Flow:
- User/API/automation triggers review.
- AO creates or reuses reviewer runs.
- Each run response/list response includes its own
reviewerHandleId when applicable.
- UI shows an attach action only for runs with a handle.
Example API response direction:
{
"summary": {
"status": "changes_requested",
"requiredApproved": 1,
"requiredTotal": 2
},
"groups": [
{
"targetSha": "abc123",
"status": "changes_requested",
"runs": [
{
"id": "run-1",
"harness": "claude-code",
"required": true,
"status": "running",
"verdict": "",
"reviewerHandleId": "review-mer-4-claude-code"
},
{
"id": "run-2",
"harness": "greptile",
"required": false,
"status": "submitted",
"verdict": "approved",
"reviewerHandleId": ""
}
]
}
]
}
Read-only reviewer behavior
Reviewers should be read-only/restrictive by default, not optionally read-only through config.
Default behavior:
- Reviewer sessions should inspect and comment, not mutate worker checkouts.
- Do not inherit worker
bypass-permissions into reviewer sessions.
- Launch reviewer harnesses with the most restrictive available permission mode.
- Make the review prompt explicit: do not modify files, do not commit, inspect and submit review only.
Reviewer config should be used for reviewer-specific settings, not to opt into read-only behavior:
review:
mode: auto
reviewers:
- harness: claude-code
required: true
config:
model: claude-opus-4-5
Later, we can discuss exceptional overrides or stronger isolation such as read-only worktree copies/runtime restrictions.
Segment 2: Product Direction - Better UX Than Greptile
The goal is not to outsmart Greptile. The goal is to make AO feel better for agent-work review because AO owns the worker sessions, PR facts, terminals, lifecycle, and follow-up loop.
Review cockpit per worker
For every worker session, expose a dedicated review panel showing:
- Current PR URL and head SHA.
- Current review batch status.
- Assigned reviewers.
- Each reviewer lifecycle state and verdict.
- Attach buttons for interactive reviewer terminals.
- Retry/cancel controls.
- Whether the review is stale because the worker pushed new commits.
The user should not need to jump between GitHub, terminal panes, logs, and the session board to understand review status.
State transparency as a differentiator
AO should make review progress legible with only the states we can actually support:
- Launching reviewer
- Reviewing commit
- Submitted verdict
- Failed with reason
- Cancelled
- Stale after new worker commit
Opaque bots leave users wondering whether anything is happening. AO should make that impossible.
Review batches as the user-facing unit
Group reviewer runs by worker session + target SHA.
Example UI concept:
Review batch for mer-4 @ abc123
- Claude Code: changes requested
- Greptile: approved
- Codex: running
Overall: changes requested
This maps to how users think: "review the worker current PR state."
Actionable aggregate outcome
The UI should answer:
- Can this worker merge?
- Is re-review needed?
- Which reviewer is blocking?
- Is the blocker actual code feedback or reviewer infrastructure failure?
- Did the worker push after the review?
Example summary:
Changes requested by Claude Code.
Greptile approved.
Worker pushed 2 commits after this batch: re-review recommended.
Follow-up loop
After a reviewer requests changes, AO should offer direct next actions:
- Send review feedback to worker.
- Ask worker to fix.
- Retry reviewer.
- Cancel stuck reviewer.
- Trigger re-review after new commits.
- Mark optional reviewer result as acknowledged.
This is where AO can outperform plain PR bots: it controls the worker-reviewer workflow, not just the PR comment.
Reviewer terminal as a first-class feature
For interactive reviewers, terminal access is a major UX advantage:
- Attach to reviewer.
- See what it is doing.
- Nudge or recover it.
- Handle auth/input prompts.
- Inspect failed launches.
This should be visible per reviewer, not hidden behind a single review-level handle.
Audit trail grouped by commit
Keep review history grouped by commit/SHA:
abc123
- Claude Code: changes requested
- Greptile: approved
def456
- Claude Code: approved
- Greptile: not run
This helps users understand whether feedback was addressed and whether the current PR head has actually been reviewed.
Proposed V2 Implementation Order
- Add review config shape with review-level
mode: auto | manual and reviewer-level required.
- Add post-onboarding config update support/design for reviewer settings.
- Add
ao review trigger and daemon/API trigger behavior for manual mode.
- Add review batch grouping by worker + PR head SHA.
- Add multiple reviewer runs per batch.
- Add minimal per-run lifecycle state and per-run terminal handle.
- Add derived aggregate review summary based on required/advisory reviewers.
- Add retry/cancel endpoints.
- Enforce read-only/restrictive reviewer behavior by default.
- Add UI review cockpit for worker sessions.
- Add "send review feedback to worker" follow-up action.
Acceptance Criteria
- A project can configure more than one reviewer.
- Review trigger behavior supports
auto and manual modes at the review config level.
- Each reviewer config can be marked
required: true or required: false.
- There is a documented or implemented path to update reviewer config after project onboarding.
- If no reviewer config is present, V1 fallback behavior is preserved: worker harness when supported, otherwise default reviewer harness.
ao review trigger <worker-session-id> exists and triggers review through the daemon.
- Triggering review for a worker creates/reuses one run per selected reviewer.
- The API exposes per-reviewer run state, verdict, target SHA, attempt, required flag, and terminal handle.
- The API derives and exposes an aggregate review summary for the worker current PR head.
- Users can retry or cancel a review run.
- Reviewer runs become visibly stale when the PR head changes.
- Reviewer sessions default to read-only/restrictive behavior where supported.
- The UI shows a worker-level review cockpit with batch status, reviewers, controls, and terminal attach actions.
@AgentWrapper @harshitsinghbhandari @neversettle17-101
Context
PR #197 introduced V1 of AO's worker-reviewer system: a worker session can have its PR reviewed by a reviewer harness, review runs are persisted, the reviewer can submit a verdict/body, and the UI/API can attach to the reviewer pane.
V2 should build on that foundation. The goal is not to build a smarter reviewer model. The goal is to make AO a better review operations layer around worker sessions: more visible, controllable, auditable, and ergonomic than opaque review bots.
Segment 1: V2 Worker-Reviewer Capabilities
Multi-reviewer support per worker/commit
Support multiple configured reviewers for the same worker PR head SHA.
Example project config shape:
Expected behavior:
worker session + PR head SHA.session_id + target_sha + reviewer_harness + attempt.Reviewer selection and fallback behavior
Keep the initial policy model simple.
Review-level mode:
auto: AO may automatically trigger reviews when the worker PR reaches a reviewable state or the PR head changes.manual: AO never auto-triggers review; the user explicitly starts review from the UI/API/CLI.Reviewer-level requirement:
required: true: this reviewer is blocking for aggregate review status and future merge readiness.required: false: this reviewer is advisory; its result is visible, but it should not block the aggregate required-review gate.Fallback behavior:
review.reviewersis configured and non-empty, use that list.review.reviewersis not configured, keep V1 behavior: resolve the reviewer harness from the worker harness when supported by the reviewer registry.claude-code.This avoids confusing modes like
all,primary_with_fallback, andrequired_optionalfor V2. Running all configured reviewers is implied by listing multiple reviewers. Required vs optional is represented per reviewer withrequired.Post-onboarding reviewer config edits
V2 should account for changing review config after a project is already registered/onboarded.
Open design question:
requiredafter project creation?This does not need to block the first backend schema work, but it should be considered part of the V2 product flow.
Explicit reviewer/run states
Keep lifecycle states minimal and tied to real flows. Do not add states until there is a concrete state transition that uses them.
Initial status vocabulary:
launching: AO accepted the trigger and is preparing/starting the reviewer.running: reviewer process/pane exists and review is in progress.submitted: reviewer submitted a result.failed: launch/runtime/reviewer flow failed.cancelled: user cancelled the run.stale: PR head changed after the run was created/submitted.Do not add
queueduntil AO has a real queue/backpressure system. Until then,launchingcovers ?requested but not running yet.?Do not add
waiting_inputin the first pass unless we can reliably derive it from reviewer/runtime signals. If we can derive it for an interactive harness later, add it then.Keep
verdictseparate from lifecycle state:approvedchanges_requestedThis lets AO say things like:
CLI trigger command
Add explicit CLI support for manual review triggering:
In
manualmode, this is the primary trigger path. Inautomode, AO automation or a worker session may use the same daemon route when review should start.Open discussion:
ao review triggerwhen auto mode is enabled, or should the daemon trigger from observed PR/session state?--reviewer,--retry, or keep trigger minimal first?Cancel and retry controls
Add review run controls:
Cancel behavior:
cancelled.Retry behavior:
attempt = previous_attempt + 1.Aggregate review summary
Expose one top-level summary per worker/SHA review batch so the UI can answer "what is the review outcome?" without making the user inspect every run.
The aggregate should be derived in service/API logic from review data, not separately stored. Inputs should include:
review_run.statusreview_run.verdictrequiredflagPossible aggregate states:
not_startedin_progressapprovedchanges_requestedfailedcancelledstalemixedExample rules:
changes_requested-> aggregatechanges_requested.approved.in_progress.failed.mixedorapproved_with_notes.stale.Reviewer prompts should produce the verdict/body, but aggregation should remain deterministic service logic so the UI gets consistent results.
Per-reviewer terminal handles
Per-reviewer terminal handles are part of the trigger/list/review cockpit flow.
Move from one review-level
reviewerHandleIdto per-run handles. When a run has an interactive reviewer, expose itsreviewerHandleId; one-shot reviewers can return an empty handle.Flow:
reviewerHandleIdwhen applicable.Example API response direction:
{ "summary": { "status": "changes_requested", "requiredApproved": 1, "requiredTotal": 2 }, "groups": [ { "targetSha": "abc123", "status": "changes_requested", "runs": [ { "id": "run-1", "harness": "claude-code", "required": true, "status": "running", "verdict": "", "reviewerHandleId": "review-mer-4-claude-code" }, { "id": "run-2", "harness": "greptile", "required": false, "status": "submitted", "verdict": "approved", "reviewerHandleId": "" } ] } ] }Read-only reviewer behavior
Reviewers should be read-only/restrictive by default, not optionally read-only through config.
Default behavior:
bypass-permissionsinto reviewer sessions.Reviewer config should be used for reviewer-specific settings, not to opt into read-only behavior:
Later, we can discuss exceptional overrides or stronger isolation such as read-only worktree copies/runtime restrictions.
Segment 2: Product Direction - Better UX Than Greptile
The goal is not to outsmart Greptile. The goal is to make AO feel better for agent-work review because AO owns the worker sessions, PR facts, terminals, lifecycle, and follow-up loop.
Review cockpit per worker
For every worker session, expose a dedicated review panel showing:
The user should not need to jump between GitHub, terminal panes, logs, and the session board to understand review status.
State transparency as a differentiator
AO should make review progress legible with only the states we can actually support:
Opaque bots leave users wondering whether anything is happening. AO should make that impossible.
Review batches as the user-facing unit
Group reviewer runs by
worker session + target SHA.Example UI concept:
This maps to how users think: "review the worker current PR state."
Actionable aggregate outcome
The UI should answer:
Example summary:
Follow-up loop
After a reviewer requests changes, AO should offer direct next actions:
This is where AO can outperform plain PR bots: it controls the worker-reviewer workflow, not just the PR comment.
Reviewer terminal as a first-class feature
For interactive reviewers, terminal access is a major UX advantage:
This should be visible per reviewer, not hidden behind a single review-level handle.
Audit trail grouped by commit
Keep review history grouped by commit/SHA:
This helps users understand whether feedback was addressed and whether the current PR head has actually been reviewed.
Proposed V2 Implementation Order
mode: auto | manualand reviewer-levelrequired.ao review triggerand daemon/API trigger behavior for manual mode.Acceptance Criteria
autoandmanualmodes at the review config level.required: trueorrequired: false.ao review trigger <worker-session-id>exists and triggers review through the daemon.