Skip to content

feat(lifecycle): ACT layer — reaction table + escalation engine (split B)#6

Merged
harshitsinghbhandari merged 3 commits into
feat/lcm-sm-contractsfrom
session/aa-4
May 27, 2026
Merged

feat(lifecycle): ACT layer — reaction table + escalation engine (split B)#6
harshitsinghbhandari merged 3 commits into
feat/lcm-sm-contractsfrom
session/aa-4

Conversation

@harshitsinghbhandari

Copy link
Copy Markdown
Collaborator

Summary

Implements split B of the Lifecycle Manager: the ACT layer that turns persisted status transitions into reactions and drives escalation. Builds on split A's synchronous Apply* pipeline (load→decide→diff→persist).

  • Reaction table (reactions.go) — the full distillation §4.2 default: ci-failed (send, retries 2, persistent), changes-requested/bugbot-comments (send, 30m), merge-conflicts (send, 15m), agent-idle (send, retries 2 + 15m), approved-and-green (notify — human decides to merge), agent-stuck/agent-needs-input/agent-exited (notify urgent), pr-closed (notify action), all-complete (notify info). auto-merge action kind exists but off by default (no row uses it). bugbot-comments/merge-conflicts are configured but dormant — split-A's decide core has no producer for them yet.
  • reactionEventFor — maps a canonical record → reaction key, mirroring DeriveLegacyStatus. Closed-PR is detected off the PR axis (it derives to idle).
  • Escalation engine — in-memory reactionTracker{attempts, escalated, firstAttemptAt} keyed by (sessionID, reactionKey). Numeric cap or duration → emit reaction.escalated + notify + silence further auto-dispatch. Send failures don't burn budget. Trackers clear on leaving the state, except ci-failed (persistent across fail→pending→fail; resets only when the PR leaves open / session terminal).
  • Real TickEscalations(now) — fires duration escalations the synchronous LCM can't wake itself for; never writes canonical state.
  • Wiring (manager.go, additive) — mutate returns a *transition; each Apply* path fires the mapped reaction after persist via the single synchronous react() chokepoint (the seam that keeps the async move localized later). OnKillRequested intentionally does not react (explicit kill ≠ inferred event). Split-A behavior unchanged.

Design decisions (agreed with @HARSHIT)

  • Synchronous dispatch through one react() chokepoint — the reaper/daemon that would own a worker goroutine is out of this split; making dispatch async later is a one-function change.
  • In-memory trackers — restart resets budgets (harmless: extra agent retries, never fewer human notifications); keeps ACT policy out of the canonical store.

Test plan

  • gofmt -l . clean
  • go build ./...
  • go vet ./...
  • go test -race ./...
  • Covers: right reaction per transition, numeric escalation (2 sends + 1 escalation), duration escalation via TickEscalations, non-persistent tracker clearing, pr-closed/merged, agent-exited on inferred death, explicit kill fires nothing.

🤖 Generated with Claude Code

…t B)

Add the ACT half of the LCM: map persisted status transitions to reactions
(send-to-agent / notify / auto-merge) and drive escalation.

- reactions.go: the §4.2 default reaction table, reactionEventFor (mirrors
  DeriveLegacyStatus for the ACT layer), in-memory per-(session,reaction)
  escalation trackers, the react() dispatch chokepoint, and a real
  TickEscalations for duration-based escalations the synchronous LCM can't
  wake itself for. auto-merge action exists but is off by default;
  bugbot-comments/merge-conflicts are configured but dormant (no decide-core
  producer yet).
- manager.go: mutate now returns a transition; each Apply* path fires the
  mapped reaction after persist via the single synchronous react() seam.
  OnKillRequested intentionally does not react (explicit kill != inferred
  event). Split-A load->decide->diff->persist behavior is unchanged.

ci-failed budget is persistent across fail->pending->fail oscillation;
non-persistent trackers reset when the status leaves the triggering state.
Escalation silences further auto-dispatch until the condition clears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@harshitsinghbhandari harshitsinghbhandari left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — ACT layer (split B), at 8b8da8e

Verified independently: gofmt clean, build/vet pass, go test -race green at 89.2% coverage. The reaction table matches §4.2, the wiring is clean (CI log tail threaded into the SCM reaction; OnKillRequested correctly discards its transition so an explicit kill fires nothing; react() runs outside the per-session lock so a busy-waiting send-to-agent doesn't hold the session mutex), and the synchronous-single-chokepoint design is well-documented. Two findings.

1. Persistent ci-failed tracker can leak / never clear (fix before merge)

react() only clears the tracker for beforeKey, and for a persistent reaction only when incidentOver(afterLC) holds at the moment it leaves that reaction. So the clear never happens if the session leaves ci-failed to another open-PR state first:

ci-failed → approved → merged
  └ leaving ci-failed: incidentOver(approved)=false (PR still open) → NOT cleared
                        leaving approved: only the approved tracker is considered
  ⇒ the ci-failed tracker is never cleared → leaks

Two consequences:

  • Memory leak: one orphaned reactionTracker per session that had a ci-failed then recovered before merging.
  • Stale silencing: if such a tracker already hit escalated=true, a later regression approved → ci-failed re-enters sendToAgent, sees escalated, and is silenced — the agent is never re-nudged and no fresh budget is granted, even though the prior incident effectively ended.

Suggested fix: when a transition reaches incidentOver(afterLC) (PR no longer open / session terminal), clear all trackers for that session, not just beforeKey. That also forces a decision worth making explicit: does a genuine recovery to approved/green count as the incident ending (re-arm the budget), or only a merge/close? The persistent flag was for fail→pending→fail flap — confirm it shouldn't also reset on a real recovery.

2. react() runs outside the per-session lock → unserialized dispatch (integration-time)

Persists are serialized under keyedMutex, but react() fires after the lock releases, so for a live daemon with concurrent observers (SCM poller + reaper + activity ingest) reactions for one session can interleave / run on a stale afterLC snapshot — e.g. a ci-failed send-to-agent dispatching after the session already moved to approved. Not observable in the current single-threaded tests, and the out-of-lock placement is the right call to avoid holding the mutex during a busy-wait. Flag for the integration step: either re-verify the session is still in the triggering state before dispatching, or give react() a per-session ordering (a small react queue). Worth a code comment now so it isn't forgotten.

Minor

  • Notify reactions have no cross-re-entry dedup: an approved → ci-failed → approved oscillation re-fires "ready to merge" each time it re-enters approved. Probably fine, but noisy under flap.
  • agent-stuck's §4.2 threshold: 10m isn't implemented (notify fires immediately on entering stuck). Likely redundant given the detecting→stuck debounce already gates it — fine to skip, just noting the divergence.

Recommendation: Approve once #1 is addressed (it's a real bug in the escalation engine, the core of this PR); #2 as a tracked integration-time item; minors optional. Nice work on the chokepoint design and the lock-free TickEscalations.

 review)

Address review finding #1: the persistent ci-failed tracker leaked and could
stale-silence a future regression. It was only cleared when leaving the
ci-failed reaction AND incidentOver held at that moment — so a recovery to
another open-PR state (ci-failed -> approved -> merged) never cleared it.

- react() now clears ALL of a session's trackers when the state REACHED is
  incident-over (PR resolved / session terminal) OR a genuine recovery
  (approved/mergeable, which the open-PR ladder guarantees means CI is no
  longer failing). Keyed on the state reached, not the one left, since the
  recovery transition is typically review_pending->approved (empty beforeKey).
- Persistent ci-failed still survives the ambiguous review_pending limbo, so
  fail->pending->fail keeps one shared budget (§4.2).
- Document the out-of-lock react() dispatch caveat for the daemon integration
  step (review #2) and the intentionally-skipped agent-stuck 10m threshold.

Tests: re-arm after a genuine recovery (regression re-nudges, not silenced);
all session trackers cleared once the incident is over.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@harshitsinghbhandari

Copy link
Copy Markdown
Collaborator Author

Thanks for the careful review — pushed b2161d5.

#1 (the real bug) — fixed. You're right, and the trace was spot-on. My original clear was keyed on beforeKey + incidentOver-at-leave-time, which never fires on ci-failed → approved → merged (leaving ci-failed to an open state, then the merge transition only considers the approved tracker). I restructured so clearing is keyed on the state reached, not the one left:

  • react() now clears all of a session's trackers when afterLC is either incidentOver (PR merged/closed or session terminal) or recovered (approved/mergeable). This kills the leak and the stale escalated=true silencing.
  • On your explicit question — yes, a genuine recovery re-arms the budget. recovered = approved/merge_ready, which the open-PR ladder guarantees means CI is no longer failing (it ranks ci_failing above approved, so the two can't co-display). A later regression approved → ci-failed now gets a fresh budget and re-nudges the agent.
  • The persistent flag still does its §4.2 job: it survives only the ambiguous review_pending limbo (where CISummary=pending is indistinguishable from CI-passed-awaiting-review at the canonical level), so fail→pending→fail keeps one shared budget. It no longer survives a real recovery.
  • New tests: TestReaction_CIFailedRearmsOnGenuineRecovery (regression after recovery re-nudges, not silenced) and TestReaction_IncidentOverClearsAllSessionTrackers (no tracker survives a merge).

#2 (out-of-lock dispatch) — documented as a tracked integration item. Added a comment on react(): under a live daemon with concurrent observers the afterLC snapshot can be stale by dispatch time; when the daemon lands, either re-check the triggering state before dispatching or give react() a per-session react queue. Kept the dispatch outside the lock as you noted (so a busy-waiting send never holds the mutex).

Minors:

  • agent-stuck threshold: 10m — added a comment explaining it's intentionally not gated (entry into stuck is already debounced upstream by the detecting→stuck quarantine, so a second timer is redundant).
  • notify re-entry dedup on approved → ci-failed → approved flap — left as-is per your call; noting it's bounded by how often a PR realistically flaps green↔red.

Gates green after the fix: gofmt clean, build, vet, go test -race ./....

@harshitsinghbhandari harshitsinghbhandari left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — b2161d5

Verified independently: gofmt clean, build/vet pass, go test -race green at 89.3% coverage. Finding #1 is correctly resolved, finding #2 documented.

#1 — persistent-tracker lifecycle ✅ The clear is now keyed on the state reached (incidentOver || recovered) and clears all of the session's trackers there, with recovered scoped to unambiguous green (approved/merge_ready) — which, since the open-PR ladder ranks ci_failing above approval, can't coexist with failing CI. This is the right call and resolves the open question well:

  • ci-failed → approved → merged no longer leaks the ci-failed tracker (cleared on reaching approved).
  • a regression approved → ci-failed re-arms with a fresh budget instead of staying silenced.
  • the fail → pending → fail flap still drains one budget — review_pending/pr_open are neither incident-over nor recovered, so the persistent tracker survives the limbo.
  • covered by TestReaction_CIFailedRearmsOnGenuineRecovery and TestReaction_IncidentOverClearsAllSessionTrackers.

#2 — out-of-lock dispatch ✅ Documented as an integration-time caveat on react() with the concrete options (per-session react queue / re-check before dispatch). Good enough until the daemon runs concurrent observers.

agent-stuck threshold — documented why it's intentionally skipped (upstream detecting→stuck debounce). Agreed.

LGTM — approving. Clean turnaround.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements the Lifecycle Manager ACT layer, adding reaction dispatch and escalation behavior after persisted lifecycle transitions.

Changes:

  • Adds the default reaction table, reaction mapping, tracker-based escalation, and TickEscalations.
  • Wires Apply* paths to return persisted transitions and invoke react() after mutation.
  • Adds reaction-focused tests and replaces the prior no-op escalation test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
backend/internal/lifecycle/reactions.go Adds reaction definitions, dispatch, escalation tracking, and duration tick handling.
backend/internal/lifecycle/reactions_test.go Adds tests for reaction mapping, escalation, tracker clearing, and tick behavior.
backend/internal/lifecycle/manager.go Wires lifecycle mutations to post-persist reactions and initializes tracker state.
backend/internal/lifecycle/manager_test.go Removes the obsolete no-op TickEscalations test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread backend/internal/lifecycle/manager.go
Comment thread backend/internal/lifecycle/reactions.go Outdated
Comment thread backend/internal/lifecycle/reactions.go Outdated
Comment thread backend/internal/lifecycle/reactions.go Outdated
Comment thread backend/internal/lifecycle/reactions.go
…escalation (Copilot review)

- OnKillRequested now clears the session's escalation trackers after a
  successful kill, so a later duration-based TickEscalations can't emit
  reaction.escalated for a dead session (dispatch is still skipped).
- sendToAgent rolls back the attempt (and firstAttemptAt when it set it) on a
  messenger.Send error, so undelivered messages don't march a reaction toward
  escalation — honoring "send failures retry next tick" (§4.3).
- Duration escalation now uses an inclusive boundary (>=) in both shouldEscalate
  and TickEscalations, so a 30m reaction escalates at exactly 30m instead of
  waiting for the next tick.

Tests: kill clears trackers + no post-kill escalation; repeated failed delivery
never escalates; duration escalation fires at exactly escalateAfter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@harshitsinghbhandari harshitsinghbhandari left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review — 5081cf7 (Copilot fixes)

Verified the delta over the LGTM'd b2161d5; gofmt clean, build/vet pass, go test -race green at 89.3%. All five Copilot fixes are sound:

  • OnKillRequested clears the session's trackers — correct; a kill is terminal but fires no reaction, so without this a leftover tracker could later emit reaction.escalated for a dead session via TickEscalations. Covered by TestReaction_KillClearsEscalationTrackers.
  • Send-error attempt rollback — decrements attempts (and resets firstAttemptAt only when this call set it, via freshFirst) under trackerMu, so an undelivered message doesn't march toward escalation (§4.3). Covered by TestReaction_SendFailureDoesNotBurnBudget.
  • Inclusive >= boundary — applied consistently in both shouldEscalate and TickEscalations, so escalation fires at exactly escalateAfter instead of waiting a tick.
  • Out-of-lock dispatch note — = my finding #2, documented and deferred to daemon integration.

Still LGTM. Good to merge.

@harshitsinghbhandari harshitsinghbhandari merged commit f70368b into feat/lcm-sm-contracts May 27, 2026
2 checks passed
neversettle17-101 added a commit that referenced this pull request Jun 12, 2026
…viewer to worker harness

Per PR #197 review feedback:
- Reviewer agent posts its review to the PR itself, so remove the
  ports.PRReviewPoster port, the GitHub review poster, the submit HTTP
  route + DTO, and the service Submit method (#1, #4, #7).
- Trigger spawns the reviewer agent over the worker's worktree with its
  own review prompt, mirroring the session launch flow (resolve agent by
  harness -> argv -> runtime.Create) (#8, #9).
- Default reviewer harness reuses the worker's harness when supported,
  falling back to claude-code; reviewer config stays independent of the
  worker override (#5, #6).
- Drop the `ao review` CLI for this PR's scope (#2, #3).

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Vaibhaav-Tiwari pushed a commit that referenced this pull request Jun 14, 2026
* feat(review): configurable AO code review backend (V1)

Add per-project configurable code review of a worker's PR. A reviewer
agent runs one-shot over the worker's own worktree and posts its result
to the PR; the worker picks the feedback up through the existing SCM
observer review-nudge path.

- domain: ProjectConfig.reviewers (+ default reviewer harness), Review /
  ReviewRun types and verdict/status vocab.
- storage: review + review_run tables (0011), sqlc queries, store methods.
- service/review: rewrite the in-memory stub as a persisted ReviewService
  (Trigger/Submit/List) with a reviewer Runner over agent resolver +
  runtime; ports.PRReviewPoster implemented on the GitHub adapter.
- http: session-scoped routes POST /sessions/{id}/reviews/trigger,
  POST .../submit, GET .../reviews; regenerated OpenAPI + TS types.
- cli: ao review trigger|submit|list.
- frontend: adapt ReviewDashboard to the per-worker reviews API.

Closes #192

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): address review — drop submit/poster/CLI, default reviewer to worker harness

Per PR #197 review feedback:
- Reviewer agent posts its review to the PR itself, so remove the
  ports.PRReviewPoster port, the GitHub review poster, the submit HTTP
  route + DTO, and the service Submit method (#1, #4, #7).
- Trigger spawns the reviewer agent over the worker's worktree with its
  own review prompt, mirroring the session launch flow (resolve agent by
  harness -> argv -> runtime.Create) (#8, #9).
- Default reviewer harness reuses the worker's harness when supported,
  falling back to claude-code; reviewer config stays independent of the
  worker override (#5, #6).
- Drop the `ao review` CLI for this PR's scope (#2, #3).

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(review): restore ao review submit (records verdict+body in AO)

Per maintainer request, bring back `ao review submit`. AO records the
reviewer's verdict and body on the review_run and marks the pass complete;
it does not post to GitHub — the reviewer agent posts its review to the PR
itself.

- storage: add review_run.body (0011), persist via Insert/UpdateReviewRunResult.
- service: restore Submit (no SCM poster) storing verdict + body.
- http: restore POST /sessions/{id}/reviews/submit + SubmitReviewInput.
- cli: ao review submit [worker] --verdict --body (worker from arg/--session/$AO_REVIEW_WORKER).
- runner: reviewer prompt instructs posting to GitHub and recording via ao review submit.

Regenerated OpenAPI + TS types.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): move reviewer runner to its own package; sharpen prompt

Per PR #197 review:
- Move the concrete reviewer runner out of the service layer into a new
  internal/review_runner package (package reviewrunner), beside other
  orchestration packages like session_manager. The service keeps only the
  Runner interface + RunSpec it depends on; the agent-resolver + runtime
  launch flow lives in review_runner.
- Sharpen the reviewer prompt: tell the agent to diff against the PR base,
  focus on high-confidence findings, post via `gh pr review`, and record
  the result with `ao review submit`; review-only (no commits/edits).
- Add unit tests for the runner.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): simplify review_run schema; provider-agnostic reviewer prompt

Per PR #197 review:
- review_run: status default 'running' (drop 'pending'), drop CHECK
  constraints on status/verdict, drop the updated_at column and the
  session/iteration index. Propagated through queries, domain, store,
  service, and tests.
- Reviewer prompt no longer hardcodes GitHub/gh commands — it instructs the
  agent to use whatever review tooling the provider offers, keeping the
  flow extensible across SCM providers.

Regenerated sqlc + OpenAPI/TS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): launch reviewer before persisting the run

Trigger now spawns the reviewer agent first and then writes the review_run
with a status derived from the launch outcome (running on success, failed
if it never started), instead of inserting a running row and correcting it
to failed afterwards.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): pluggable reviewer registry distinct from worker harnesses

Reviewers are now their own pluggable adapter set, separate from the worker
agent registry — adding a reviewer (claude-code today, greptile tomorrow) is
a one-line registration that does not widen the worker harness vocabulary,
and a worker harness does not automatically become a valid reviewer.

- domain.ReviewerHarness: a distinct vocabulary (AllReviewerHarnesses) with
  its own IsKnown; ReviewerConfig/Review/ReviewRun use it. ResolveReviewerHarness
  reuses the worker harness only when it is itself a supported reviewer, else
  falls back to claude-code.
- ports.Reviewer: a reviewer-specific contract (ReviewCommand → argv + env)
  that models one-shot / non-prompt CLIs natively instead of forcing every
  reviewer through the worker's interactive GetLaunchCommand(Prompt:...).
- internal/adapters/reviewer: a separate registry + resolver (mirrors the
  worker agent registry) with the claude-code reviewer adapter, which owns the
  review prompt and reuses the worker claude-code launch construction.
- review_runner resolves via the reviewer registry (not the worker
  AgentResolver) and merges AO_REVIEW_WORKER into the adapter's env.
- daemon wires the reviewer resolver. Registry/domain parity is test-enforced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* test(review): cover run-scoped reviewer submit

* fix(api): update generated review submit schema

* refactor(review): split core engine (internal/review) from API service

Move the review orchestration (Trigger/Submit/List, run-id generation,
deps, RunSpec/Runner, sentinels) into a transport-independent core package
internal/review (Engine). internal/service/review is now a thin API-flow
boundary: the controller-facing Manager interface + a Service that delegates
to the engine + error re-exports.

This keeps the service layer to API concerns and lets the same engine back a
future in-process CLI trigger without going through HTTP. review_runner now
depends on the core package; daemon builds the engine and wraps it in the
service. No API/schema changes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* feat(review): commit-aware trigger, reviewer handle for UI, no env vars

Reworks the review trigger lifecycle and drops env-based coupling:

- review_run gains target_sha (the reviewed commit) and drops iteration.
  A repeat trigger for the same PR head short-circuits to the existing run.
- review gains reviewer_handle_id: the live reviewer pane's runtime handle,
  reused across passes and exposed in the reviews API so the UI can attach
  its terminal over /mux.
- Trigger flow: if a live reviewer pane exists and a new commit arrived,
  message it to re-review; otherwise spawn a fresh reviewer. The run is
  recorded only after the reviewer is launched.
- No environment variables: the reviewer adapter embeds the explicit
  `ao review submit --session <w> --run <id>` command in the spawn prompt
  and the re-review message. CLI submit requires --run/--session (no env
  fallbacks).
- Merge review_runner into internal/review as a Launcher (spawn/notify/alive).
- Trigger returns 201 for a new pass, 200 when reusing an existing run.

Regenerated sqlc + OpenAPI/TS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): author the reviewer prompt centrally, not in the adapter

Mirror the worker model (session_manager builds the prompt; adapters just
place it via LaunchConfig.Prompt). The reviewer prompt now lives in
internal/review/prompt.go and is passed through ports.ReviewInvocation.Prompt;
the claude-code reviewer adapter just feeds inv.Prompt to its launch command
and returns it as the re-review message. One-shot CLI reviewers may ignore it.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(review): split reviewer prompt into system+task, mirroring buildSpawnTexts

Mirror session_manager.buildSpawnTexts for the reviewer: a standing role goes
in the system prompt, the per-pass task (PR/commit + exact `ao review submit`
command) goes in the user prompt. internal/review/prompt.go now returns
(prompt, systemPrompt); both flow through ports.ReviewInvocation and the
claude-code adapter places them via LaunchConfig{Prompt, SystemPrompt}. The
re-review message reuses the per-pass prompt (role already established in the
running pane).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Co-authored-by: Vaibhaav <user@example.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants