diff --git a/.codex/skills/qa-night-shift/SKILL.md b/.codex/skills/qa-night-shift/SKILL.md index 5eebb72..8ebda22 100644 --- a/.codex/skills/qa-night-shift/SKILL.md +++ b/.codex/skills/qa-night-shift/SKILL.md @@ -1,6 +1,6 @@ --- name: qa-night-shift -description: Use when the user wants to QA test Night Shift against a user-specified scratch repo path, install the current worktree CLI, and run an approval-gated real-provider pass to validate init/plan/start/status/report/resolve/resume behavior. +description: Use when the user wants to QA test Night Shift against a user-specified scratch repo path, install the current worktree CLI, and run an approval-gated real-provider pass to validate init/plan/start/status/report/provenance/doctor/resolve/resume behavior. --- # QA Night Shift @@ -59,7 +59,10 @@ real inference spend. - If it does look like an intentional testing target, proceed. - Even for an obvious scratch repo, do not run `night-shift plan`, `night-shift start`, `night-shift resume`, or other inference-consuming QA - steps until the user approves the presented plan. + steps until the user approves the presented plan. Read-only checks such as + `night-shift status`, `night-shift report`, `night-shift provenance`, + `night-shift doctor`, or `night-shift resume --explain` are acceptable once + the user-approved QA pass reaches the relevant state. Do not quietly assume a normal product repo is safe to use for QA. @@ -123,7 +126,10 @@ Typical flow: 5. inspect `night-shift status` 6. run `night-shift start` 7. inspect `night-shift report` -8. use `night-shift resolve` or `night-shift resume` only if the run actually +8. inspect `night-shift provenance` +9. use `night-shift doctor` or `night-shift resume --explain` before any real + resume attempt when the run was interrupted +10. use `night-shift resolve` or `night-shift resume` only if the run actually requires it For review-driven investigations, replace steps 3-4 with: @@ -167,6 +173,13 @@ In review-driven runs, pay attention to repo-state evidence: manual attention - whether `status` and `report` show payload-repair attempts, successes, and failures with usable artifact paths +- whether `status`, `report`, and the dashboard agree on the confidence posture + and its reasons +- whether `provenance` records the expected prompt paths, payload artifacts, + verification evidence, worktree paths, and PR linkage +- whether `doctor` classifies interrupted tasks as `safe_to_resume`, + `resume_with_warning`, `manual_attention`, or `irrecoverable` for the actual + saved repo state Use small tasks that validate the requested behavior instead of inviting large feature work. @@ -180,6 +193,7 @@ Collect evidence from: - relevant CLI output - the current report path printed by Night Shift - run journal paths under `.night-shift/runs/` +- the `provenance.json` path and any task-specific artifact paths it surfaces - relevant logs for the failing or surprising step - PR or delivery results when they happen - any verification output tied to the run diff --git a/README.md b/README.md index 977da91..81d02ab 100644 --- a/README.md +++ b/README.md @@ -31,11 +31,14 @@ night-shift plan --notes notes/today.md night-shift start night-shift status night-shift report +night-shift provenance ``` Supporting commands round out the lifecycle: - `resolve` records answers for blocked planning decisions and replans the run +- `doctor` explains whether a saved run is safe to resume and why +- `provenance` renders a per-run evidence ledger from saved artifacts - `resume` recovers an interrupted run from saved state - `plan --from-reviews` turns open Night Shift PR feedback into a fresh successor stack @@ -105,6 +108,7 @@ Inspect progress and outputs: ```sh night-shift status night-shift report +night-shift provenance ``` If planning blocked on manual decisions: @@ -117,6 +121,8 @@ night-shift start If Night Shift was interrupted mid-run: ```sh +night-shift doctor +night-shift resume --explain night-shift resume ``` diff --git a/docs/README.md b/docs/README.md index 41615be..c5bb0c8 100644 --- a/docs/README.md +++ b/docs/README.md @@ -14,7 +14,8 @@ If you are new to the project, start here: - [Getting Started](getting-started.md) for install, prerequisites, and the first runnable flow - [Run Lifecycle](run-lifecycle.md) for how `plan`, `start`, `resolve`, - `resume`, `plan --from-reviews`, and `reset` fit together + `resume`, `doctor`, `provenance`, `plan --from-reviews`, and `reset` fit + together - [Configuration](configuration.md) for `config.toml` profiles and override precedence - [Worktree Environments](worktree-environments.md) for @@ -40,11 +41,14 @@ night-shift plan --notes notes/today.md night-shift start night-shift status night-shift report +night-shift provenance ``` Supporting flows handle the messier parts of reality: - `resolve` records answers for manual-attention tasks and replans in place +- `doctor` explains whether an interrupted run looks safe to resume +- `provenance` prints the run's evidence ledger - `resume` reattaches to an interrupted run - `plan --from-reviews` turns open Night Shift PR feedback into a fresh successor stack - `reset` removes Night Shift state and tracked task worktrees, but does not touch local branches or remote PRs diff --git a/docs/getting-started.md b/docs/getting-started.md index 20b1dd0..681787a 100644 --- a/docs/getting-started.md +++ b/docs/getting-started.md @@ -121,11 +121,12 @@ Use these commands while a run is active or after it finishes: ```sh night-shift status night-shift report +night-shift provenance ``` -`status` prints the current run state, planning and execution agent summaries, -notes source, event count, and report location. `report` prints the current -markdown report directly. +`status` prints the current run state, confidence posture, provenance path, +and report location. `report` prints the current markdown report directly, and +`provenance` prints the run's evidence ledger from the saved artifact graph. ## Supporting Flows @@ -140,10 +141,16 @@ night-shift start If a run was interrupted, resume from the saved journal: ```sh +night-shift doctor +night-shift resume --explain night-shift resume night-shift resume --ui ``` +`doctor` is the dry recovery pass. It classifies each task as +`safe_to_resume`, `resume_with_warning`, `manual_attention`, or +`irrecoverable` before you mutate any run state. + If open Night Shift pull requests received feedback and you want a fresh replacement stack instead of in-place edits: diff --git a/docs/index.md b/docs/index.md index e91a4e2..d746ce3 100644 --- a/docs/index.md +++ b/docs/index.md @@ -33,9 +33,10 @@ night-shift report ``` Use `resolve` when planning needs human decisions, `resume` when a run was -interrupted, `plan --from-reviews` when open Night Shift PRs need a fresh -successor stack, and `reset` when you need to eject the repo-local control -plane and start over. +interrupted, `doctor` or `resume --explain` when you want a dry recovery read, +`plan --from-reviews` when open Night Shift PRs need a fresh successor stack, +and `reset` when you need to eject the repo-local control plane and start +over. ## Repository diff --git a/docs/run-lifecycle.md b/docs/run-lifecycle.md index 6e91606..3d6842d 100644 --- a/docs/run-lifecycle.md +++ b/docs/run-lifecycle.md @@ -58,6 +58,8 @@ and the next action becomes `night-shift start`. `resume` is the recovery path for an interrupted run: ```sh +night-shift doctor +night-shift resume --explain night-shift resume night-shift resume --run run-123 --ui ``` @@ -66,6 +68,11 @@ Night Shift reloads the saved run, validates the saved environment, recovers in-flight tasks, and continues orchestration. It does not re-resolve provider or environment settings; it reuses what the run journal already saved. +`doctor` and `resume --explain` are the read-only recovery surfaces. They +inspect the saved run, active lock, worktrees, logs, review drift, and +interrupted task states, then classify each task as `safe_to_resume`, +`resume_with_warning`, `manual_attention`, or `irrecoverable`. + ## Review-Driven Replanning Review feedback re-enters Night Shift through `plan --from-reviews`: @@ -99,6 +106,21 @@ it recomputes drift against the current PR tree when the run has a stored review snapshot, while the on-disk `report.md` remains the stable persisted artifact for the run. +## Provenance + +`provenance` is the operator-facing evidence ledger for a run: + +```sh +night-shift provenance +night-shift provenance --run run-123 --format json +night-shift provenance --task task-1 +``` + +Night Shift persists `./.night-shift/runs//provenance.json` alongside +`report.md`. The command normalizes the run journal, prompt artifacts, logs, +payload-repair traces, verification artifacts, worktree paths, and confidence +posture into one inspectable view. + ## Reset `reset` is the eject handle when the repo-local control plane has to go: @@ -141,6 +163,7 @@ Night Shift binds to `127.0.0.1`, prefers port `8787`, and serves: - run history for the current repository - run summary metadata - repo-state summary for review-driven runs, including open PR counts and drift +- confidence posture and provenance path - task status - event timeline - report content diff --git a/docs/state-and-artifacts.md b/docs/state-and-artifacts.md index 6418fee..1dbb2ad 100644 --- a/docs/state-and-artifacts.md +++ b/docs/state-and-artifacts.md @@ -33,6 +33,7 @@ Each run directory contains durable state for one run: - `state.json` - `events.jsonl` - `report.md` +- `provenance.json` - `logs/` - `worktrees/` @@ -53,6 +54,11 @@ The run record itself stores: - task list and task states - timestamps and current run status +`provenance.json` is the normalized evidence ledger for the run. It reuses the +saved run state plus artifact paths under `logs/` to record planning +provenance, prompt and payload traces, verification evidence, touched files, +worktree paths, PR linkage, and confidence posture. + ## Planning Artifacts Planning writes artifacts under `./.night-shift/planning//`. Those @@ -88,6 +94,10 @@ review-driven runs: it refreshes repo-state drift against the current open PR tree when a stored snapshot exists, so its live output is authoritative for current drift while `report.md` remains durable and offline-readable. +Likewise, the persisted `provenance.json` is the stable audit artifact for the +run, while `night-shift provenance` can render the same evidence in markdown or +refresh live review drift in JSON output. + Task-level provider logs and prompt files live under each run's `logs/` directory. diff --git a/src/night_shift/app.gleam b/src/night_shift/app.gleam index 46dd88c..cf60390 100644 --- a/src/night_shift/app.gleam +++ b/src/night_shift/app.gleam @@ -25,8 +25,10 @@ import night_shift/repo_state_runtime import night_shift/report import night_shift/system import night_shift/types +import night_shift/usecase/doctor as doctor_usecase import night_shift/usecase/init as init_usecase import night_shift/usecase/plan as plan_usecase +import night_shift/usecase/provenance as provenance_usecase import night_shift/usecase/render as usecase_render import night_shift/usecase/reset as reset_usecase import night_shift/usecase/resolve as resolve_usecase @@ -131,9 +133,13 @@ fn run_initialized_command( types.Start(run, True) -> start_with_ui(repo_root, run, config) types.Status(run) -> io.println(status(repo_root, run, config)) types.Report(run) -> io.println(report(repo_root, run, config)) + types.Provenance(run, task_id, format) -> + io.println(provenance(repo_root, run, task_id, format, config)) + types.Doctor(run) -> io.println(doctor(repo_root, run, config)) types.Resolve(run) -> io.println(resolve(repo_root, run, config)) - types.Resume(run, False) -> io.println(resume(repo_root, run, config)) - types.Resume(run, True) -> resume_with_ui(repo_root, run, config) + types.Resume(run, False, False) -> io.println(resume(repo_root, run, config)) + types.Resume(run, True, False) -> resume_with_ui(repo_root, run, config) + types.Resume(run, False, True) -> io.println(doctor(repo_root, run, config)) _ -> io.println("Unsupported command.") } } @@ -276,6 +282,30 @@ fn resume( } } +fn doctor( + repo_root: String, + run: types.RunSelector, + config: types.Config, +) -> String { + case doctor_usecase.execute(repo_root, run, config) { + Ok(rendered) -> rendered + Error(message) -> message + } +} + +fn provenance( + repo_root: String, + run: types.RunSelector, + task_id: Option(String), + format: types.ProvenanceFormat, + config: types.Config, +) -> String { + case provenance_usecase.execute(repo_root, run, task_id, format, config) { + Ok(rendered) -> rendered + Error(message) -> message + } +} + fn stringify_notifiers(notifiers: List(types.NotifierName)) -> String { notifiers |> list.map(types.notifier_to_string) diff --git a/src/night_shift/cli.gleam b/src/night_shift/cli.gleam index 77ef351..5694ba1 100644 --- a/src/night_shift/cli.gleam +++ b/src/night_shift/cli.gleam @@ -18,8 +18,10 @@ pub fn usage() -> String { <> " start [--run |latest] [--ui]\n" <> " status [--run |latest]\n" <> " report [--run |latest]\n" + <> " provenance [--run |latest] [--task ] [--format ]\n" + <> " doctor [--run |latest]\n" <> " resolve [--run |latest]\n" - <> " resume [--run |latest] [--ui]\n" + <> " resume [--run |latest] [--ui|--explain]\n" } /// Parse raw command-line arguments into a `Command`. @@ -48,6 +50,8 @@ pub fn parse(args: List(String)) -> Result(types.Command, String) { ["start", ..rest] -> parse_start(rest) ["status", ..rest] -> parse_run_lookup(rest, types.Status) ["report", ..rest] -> parse_run_lookup(rest, types.Report) + ["provenance", ..rest] -> parse_provenance(rest) + ["doctor", ..rest] -> parse_run_lookup(rest, types.Doctor) ["resolve", ..rest] -> parse_run_lookup(rest, types.Resolve) ["resume", ..rest] -> parse_resume(rest) ["review", ..] -> @@ -256,21 +260,56 @@ fn parse_start_flags( } fn parse_resume(args: List(String)) -> Result(types.Command, String) { - parse_resume_flags(args, types.LatestRun, False) + parse_resume_flags(args, types.LatestRun, False, False) } fn parse_resume_flags( args: List(String), run: types.RunSelector, ui_enabled: Bool, + explain_only: Bool, ) -> Result(types.Command, String) { case args { - [] -> Ok(types.Resume(run, ui_enabled)) + [] -> + case ui_enabled && explain_only { + True -> Error("`resume --explain` cannot be combined with `--ui`.") + False -> Ok(types.Resume(run, ui_enabled, explain_only)) + } + ["--run", "latest", ..rest] -> + parse_resume_flags(rest, types.LatestRun, ui_enabled, explain_only) + ["--run", run_id, ..rest] -> + parse_resume_flags(rest, types.RunId(run_id), ui_enabled, explain_only) + ["--ui", ..rest] -> parse_resume_flags(rest, run, True, explain_only) + ["--explain", ..rest] -> + parse_resume_flags(rest, run, ui_enabled, True) + [flag, ..] -> Error("Unsupported flag: " <> flag) + } +} + +fn parse_provenance(args: List(String)) -> Result(types.Command, String) { + parse_provenance_flags(args, types.LatestRun, None, types.ProvenanceMarkdown) +} + +fn parse_provenance_flags( + args: List(String), + run: types.RunSelector, + task_id: Option(String), + format: types.ProvenanceFormat, +) -> Result(types.Command, String) { + case args { + [] -> Ok(types.Provenance(run, task_id, format)) ["--run", "latest", ..rest] -> - parse_resume_flags(rest, types.LatestRun, ui_enabled) + parse_provenance_flags(rest, types.LatestRun, task_id, format) ["--run", run_id, ..rest] -> - parse_resume_flags(rest, types.RunId(run_id), ui_enabled) - ["--ui", ..rest] -> parse_resume_flags(rest, run, True) + parse_provenance_flags(rest, types.RunId(run_id), task_id, format) + ["--task", next_task_id, ..rest] -> + parse_provenance_flags(rest, run, Some(next_task_id), format) + ["--format", "json", ..rest] -> + parse_provenance_flags(rest, run, task_id, types.ProvenanceJson) + ["--format", "md", ..rest] -> + parse_provenance_flags(rest, run, task_id, types.ProvenanceMarkdown) + ["--format", raw_format, ..] -> + Error("Unsupported provenance format: " <> raw_format) [flag, ..] -> Error("Unsupported flag: " <> flag) } } diff --git a/src/night_shift/dashboard.gleam b/src/night_shift/dashboard.gleam index 3c625c0..03cbd2b 100644 --- a/src/night_shift/dashboard.gleam +++ b/src/night_shift/dashboard.gleam @@ -5,6 +5,8 @@ import gleam/list import gleam/option.{type Option, None, Some} import gleam/result import night_shift/config +import night_shift/domain/confidence +import night_shift/domain/provenance import night_shift/domain/review_run_projection import night_shift/journal import night_shift/project @@ -132,7 +134,7 @@ pub fn index_html(initial_run_id: String) -> String { <> " ['Run ID', run.run_id], ['Status', run.status], ['Planning profile', run.planning_agent.profile_name], ['Planning provider', run.planning_agent.provider],\n" <> " ['Planning model', run.planning_agent.model || 'default'], ['Planning reasoning', run.planning_agent.reasoning || 'default'], ['Execution profile', run.execution_agent.profile_name], ['Execution provider', run.execution_agent.provider],\n" <> " ['Execution model', run.execution_agent.model || 'default'], ['Execution reasoning', run.execution_agent.reasoning || 'default'], ['Repo', run.repo_root], ['Created', run.created_at],\n" - <> " ['Updated', run.updated_at], ['Brief', run.brief_path], ['Max workers', String(run.max_workers)]\n" + <> " ['Updated', run.updated_at], ['Brief', run.brief_path], ['Provenance', run.provenance_path], ['Confidence', run.confidence_posture], ['Confidence reasons', (run.confidence_reasons || []).join(' | ') || '—'], ['Max workers', String(run.max_workers)]\n" <> " ];\n" <> " if (run.repo_state) {\n" <> " fields.push(['Open PRs', String(run.repo_state.open_pr_count)]);\n" @@ -228,10 +230,11 @@ pub fn run_json(repo_root: String, run_id: String) -> Result(String, String) { let repo_state_view = load_repo_state_view(run) let review_projection = review_run_projection.build(run, events, repo_state_view) + let confidence_assessment = confidence.assess(run, events, repo_state_view) let rendered_report = report.render_live(run, events, repo_state_view) Ok( json.object([ - #("run", run_detail_json(run, review_projection)), + #("run", run_detail_json(run, review_projection, confidence_assessment)), #("events", json.array(events, event_json)), #("report", json.string(rendered_report)), ]) @@ -254,6 +257,7 @@ fn run_summary_json(run: types.RunRecord) -> json.Json { fn run_detail_json( run: types.RunRecord, review_projection: Option(review_run_projection.ReviewRunProjection), + confidence_assessment: types.ConfidenceAssessment, ) -> json.Json { json.object([ #("run_id", json.string(run.run_id)), @@ -261,8 +265,19 @@ fn run_detail_json( #("run_path", json.string(run.run_path)), #("brief_path", json.string(run.brief_path)), #("report_path", json.string(run.report_path)), + #("provenance_path", json.string(provenance.artifact_path(run))), #("planning_agent", agent_json(run.planning_agent)), #("execution_agent", agent_json(run.execution_agent)), + #( + "confidence_posture", + json.string(types.confidence_posture_to_string( + confidence_assessment.posture, + )), + ), + #( + "confidence_reasons", + json.array(confidence_assessment.reasons, json.string), + ), #("max_workers", json.int(run.max_workers)), #("status", json.string(types.run_status_to_string(run.status))), #("created_at", json.string(run.created_at)), diff --git a/src/night_shift/domain/confidence.gleam b/src/night_shift/domain/confidence.gleam new file mode 100644 index 0000000..37e2f19 --- /dev/null +++ b/src/night_shift/domain/confidence.gleam @@ -0,0 +1,265 @@ +import gleam/int +import gleam/list +import gleam/option.{type Option, None, Some} +import gleam/string +import night_shift/repo_state_runtime +import night_shift/types +import simplifile + +pub fn assess( + run: types.RunRecord, + events: List(types.RunEvent), + repo_state_view: Option(repo_state_runtime.RepoStateView), +) -> types.ConfidenceAssessment { + let severe = severe_reasons(run, events) + let moderate = moderate_reasons(events, repo_state_view) + let positive = positive_reasons(run, events) + + case severe { + [_, ..] -> + types.ConfidenceAssessment( + posture: types.ConfidenceLow, + reasons: take_first(list.append(severe, moderate), 4), + ) + [] -> + case moderate { + [_, ..] -> + types.ConfidenceAssessment( + posture: types.ConfidenceGuarded, + reasons: take_first(list.append(moderate, positive), 4), + ) + [] -> + types.ConfidenceAssessment( + posture: types.ConfidenceHigh, + reasons: case positive { + [] -> ["No elevated-risk signals are recorded for this run."] + _ -> take_first(positive, 4) + }, + ) + } + } +} + +pub fn reasons_summary(assessment: types.ConfidenceAssessment) -> String { + case assessment.reasons { + [] -> "none" + reasons -> string.join(reasons, with: " | ") + } +} + +fn severe_reasons( + run: types.RunRecord, + events: List(types.RunEvent), +) -> List(String) { + let manual_attention_count = + run.tasks + |> list.filter(fn(task) { + types.task_requires_manual_attention(run.decisions, task) + }) + |> list.length + let failed_count = + run.tasks + |> list.filter(fn(task) { task.state == types.Failed }) + |> list.length + let missing_worktrees = + run.tasks + |> list.filter(fn(task) { task.worktree_path != "" }) + |> list.filter(fn(task) { !directory_exists(task.worktree_path) }) + |> list.length + let payload_repair_failures = event_count(events, "execution_payload_repair_failed") + let run_failed = event_count(events, "run_failed") + + [ + latest_environment_preflight_failure(events) + |> option_reason("Environment bootstrap failed."), + count_reason( + manual_attention_count, + "manual-attention task is still unresolved.", + "manual-attention tasks are still unresolved.", + ), + count_reason( + unresolved_decision_requests_count(run), + "operator decision is still unresolved.", + "operator decisions are still unresolved.", + ), + count_reason(failed_count, "task failed.", "tasks failed."), + count_reason( + payload_repair_failures, + "payload repair failed.", + "payload repairs failed.", + ), + count_reason( + missing_worktrees, + "retained worktree is missing from disk.", + "retained worktrees are missing from disk.", + ), + count_reason(run_failed, "run failure was recorded.", "run failures were recorded."), + ] + |> list.filter_map(identity_reason) +} + +fn moderate_reasons( + events: List(types.RunEvent), + repo_state_view: Option(repo_state_runtime.RepoStateView), +) -> List(String) { + let payload_warnings = event_count(events, "execution_payload_warning") + let payload_repairs = event_count(events, "execution_payload_repair_succeeded") + let prune_warnings = event_count(events, "worktree_prune_warning") + let supersession_warnings = event_count(events, "review_supersession_warning") + let repo_state_reason = case repo_state_view { + Some(view) -> + case view.drift { + repo_state_runtime.RepoStateDrifted -> + Some("Review snapshot drifted since planning.") + repo_state_runtime.RepoStateDriftUnknown(_) -> + Some("Live review snapshot refresh is unavailable.") + _ -> None + } + None -> None + } + + [ + count_reason( + payload_warnings, + "recovered execution payload was accepted.", + "recovered execution payloads were accepted.", + ), + count_reason( + payload_repairs, + "JSON-only payload repair succeeded.", + "JSON-only payload repairs succeeded.", + ), + count_reason( + prune_warnings, + "worktree prune warning was recorded.", + "worktree prune warnings were recorded.", + ), + count_reason( + supersession_warnings, + "review supersession warning was recorded.", + "review supersession warnings were recorded.", + ), + repo_state_reason, + ] + |> list.filter_map(identity_reason) +} + +fn positive_reasons( + run: types.RunRecord, + events: List(types.RunEvent), +) -> List(String) { + let pr_opened = event_count(events, "pr_opened") + let verified = event_count(events, "task_verified") + let retained_worktrees = + run.tasks + |> list.filter(fn(task) { task.worktree_path != "" }) + |> list.length + let retained_and_present = + retained_worktrees > 0 + && list.all( + run.tasks + |> list.filter(fn(task) { task.worktree_path != "" }), + fn(task) { directory_exists(task.worktree_path) }, + ) + + [ + case verified > 0 { + True -> Some("Verification passed for delivered task work.") + False -> None + }, + case pr_opened > 0 { + True -> Some("Delivered pull requests are recorded in the journal.") + False -> None + }, + case unresolved_decision_requests_count(run) == 0 { + True -> Some("No outstanding operator decisions remain.") + False -> None + }, + case retained_and_present { + True -> Some("Retained worktrees remain mounted for inspection and recovery.") + False -> None + }, + ] + |> list.filter_map(identity_reason) +} + +fn unresolved_decision_requests_count(run: types.RunRecord) -> Int { + run.tasks + |> list.filter(fn(task) { + types.task_requires_manual_attention(run.decisions, task) + }) + |> list.map(fn(task) { + list.length(types.unresolved_decision_requests(run.decisions, task)) + }) + |> list.fold(0, fn(total, count) { total + count }) +} + +fn event_count(events: List(types.RunEvent), kind: String) -> Int { + events + |> list.filter(fn(event) { event.kind == kind }) + |> list.length +} + +fn latest_environment_preflight_failure( + events: List(types.RunEvent), +) -> Option(String) { + latest_environment_preflight_failure_loop(list.reverse(events)) +} + +fn latest_environment_preflight_failure_loop( + events: List(types.RunEvent), +) -> Option(String) { + case events { + [] -> None + [event, ..rest] -> + case event.kind == "environment_preflight_failed" { + True -> Some(event.message) + False -> latest_environment_preflight_failure_loop(rest) + } + } +} + +fn directory_exists(path: String) -> Bool { + case simplifile.read_directory(at: path) { + Ok(_) -> True + Error(_) -> False + } +} + +fn option_reason(value: Option(a), message: String) -> Option(String) { + case value { + Some(_) -> Some(message) + None -> None + } +} + +fn count_reason(count: Int, singular: String, plural: String) -> Option(String) { + case count { + 0 -> None + 1 -> Some("1 " <> singular) + _ -> Some(int.to_string(count) <> " " <> plural) + } +} + +fn identity_reason(value: Option(String)) -> Result(String, Nil) { + case value { + Some(reason) -> Ok(reason) + None -> Error(Nil) + } +} + +fn take_first(values: List(String), limit: Int) -> List(String) { + take_first_loop(values, limit, []) +} + +fn take_first_loop( + values: List(String), + remaining: Int, + acc: List(String), +) -> List(String) { + case values, remaining <= 0 { + _, True -> list.reverse(acc) + [], False -> list.reverse(acc) + [value, ..rest], False -> take_first_loop(rest, remaining - 1, [value, ..acc]) + } +} diff --git a/src/night_shift/domain/provenance.gleam b/src/night_shift/domain/provenance.gleam new file mode 100644 index 0000000..1cd43e3 --- /dev/null +++ b/src/night_shift/domain/provenance.gleam @@ -0,0 +1,590 @@ +import filepath +import gleam/json +import gleam/list +import gleam/option.{type Option, None, Some} +import gleam/result +import gleam/string +import night_shift/config +import night_shift/domain/confidence +import night_shift/project +import night_shift/repo_state_runtime +import night_shift/types +import simplifile + +pub fn artifact_path(run: types.RunRecord) -> String { + filepath.join(run.run_path, "provenance.json") +} + +pub fn write_persisted( + run: types.RunRecord, + events: List(types.RunEvent), +) -> Result(Nil, String) { + let verification_commands = load_verification_commands(run.repo_root) + let repo_state_view = None + let assessment = confidence.assess(run, events, repo_state_view) + let manifest = + manifest_json(run, events, repo_state_view, verification_commands, assessment) + write_file(artifact_path(run), json.to_string(manifest)) +} + +pub fn render( + run: types.RunRecord, + events: List(types.RunEvent), + repo_state_view: Option(repo_state_runtime.RepoStateView), + task_filter: Option(String), + format: types.ProvenanceFormat, + verification_commands: List(String), +) -> Result(String, String) { + use filtered_tasks <- result.try(filter_tasks(run.tasks, task_filter)) + let filtered_run = types.RunRecord(..run, tasks: filtered_tasks) + let filtered_events = filter_events(events, task_filter) + let assessment = confidence.assess(filtered_run, filtered_events, repo_state_view) + + Ok(case format { + types.ProvenanceJson -> + manifest_json( + filtered_run, + filtered_events, + repo_state_view, + verification_commands, + assessment, + ) + |> json.to_string + types.ProvenanceMarkdown -> + render_markdown( + filtered_run, + filtered_events, + repo_state_view, + verification_commands, + assessment, + ) + }) +} + +fn render_markdown( + run: types.RunRecord, + events: List(types.RunEvent), + repo_state_view: Option(repo_state_runtime.RepoStateView), + verification_commands: List(String), + assessment: types.ConfidenceAssessment, +) -> String { + [ + "# Night Shift Provenance", + "", + "## Run", + "- Run ID: " <> run.run_id, + "- Status: " <> types.run_status_to_string(run.status), + "- Brief: " <> run.brief_path, + "- Report: " <> run.report_path, + "- Provenance artifact: " <> artifact_path(run), + "- Confidence posture: " + <> types.confidence_posture_to_string(assessment.posture), + "- Confidence reasons: " <> confidence.reasons_summary(assessment), + "- Planning provenance: " <> render_planning_provenance(run.planning_provenance), + "- Notes source: " <> render_notes_source(run.notes_source), + "- Planning artifacts: " <> render_string_list(planning_artifact_paths(run, events)), + "- Planner prompt: " <> render_optional_path(planner_prompt_path(run.run_path)), + "- Planner log: " <> render_optional_path(planner_log_path(run.run_path)), + render_review_state_markdown(run, repo_state_view), + "", + "## Tasks", + render_task_sections(run, events, verification_commands), + "", + "## Event References", + render_event_refs(events), + ] + |> list.filter(fn(line) { line != "" }) + |> string.join(with: "\n") +} + +fn manifest_json( + run: types.RunRecord, + events: List(types.RunEvent), + repo_state_view: Option(repo_state_runtime.RepoStateView), + verification_commands: List(String), + assessment: types.ConfidenceAssessment, +) -> json.Json { + json.object([ + #( + "run", + json.object([ + #("run_id", json.string(run.run_id)), + #("status", json.string(types.run_status_to_string(run.status))), + #("repo_root", json.string(run.repo_root)), + #("run_path", json.string(run.run_path)), + #("brief_path", json.string(run.brief_path)), + #("report_path", json.string(run.report_path)), + #("provenance_path", json.string(artifact_path(run))), + #("planning_agent", agent_json(run.planning_agent)), + #("execution_agent", agent_json(run.execution_agent)), + #("planning_provenance", json.string(render_planning_provenance( + run.planning_provenance, + ))), + #("notes_source", json.string(render_notes_source(run.notes_source))), + #( + "planning_artifacts", + json.array(planning_artifact_paths(run, events), json.string), + ), + #( + "planner_prompt_path", + json.nullable(from: planner_prompt_path(run.run_path), of: json.string), + ), + #( + "planner_log_path", + json.nullable(from: planner_log_path(run.run_path), of: json.string), + ), + ]), + ), + #( + "confidence_posture", + json.object([ + #( + "level", + json.string(types.confidence_posture_to_string(assessment.posture)), + ), + #("reasons", json.array(assessment.reasons, json.string)), + ]), + ), + #( + "review_state", + json.nullable( + from: review_state_json(run, repo_state_view), + of: identity_json, + ), + ), + #("tasks", json.array(run.tasks, task_json(_, run, events, verification_commands))), + #("event_refs", json.array(events, event_ref_json)), + ]) +} + +fn task_json( + task: types.Task, + run: types.RunRecord, + events: List(types.RunEvent), + verification_commands: List(String), +) -> json.Json { + let relevant_events = + events + |> list.filter(fn(event) { event.task_id == Some(task.id) }) + let verification_log = existing_file(verification_log_path(run.run_path, task.id)) + + json.object([ + #("id", json.string(task.id)), + #("title", json.string(task.title)), + #("state", json.string(types.task_state_to_string(task.state))), + #("summary", json.string(task.summary)), + #("worktree_path", json.string(task.worktree_path)), + #("branch_name", json.string(task.branch_name)), + #("pr_number", json.string(task.pr_number)), + #( + "superseded_pr_numbers", + json.array(task.superseded_pr_numbers, json.int), + ), + #("files_touched", json.array(parse_changed_files(task.summary), json.string)), + #( + "verification", + json.object([ + #("commands", json.array(verification_commands, json.string)), + #( + "outcome", + json.string(verification_outcome(task, relevant_events)), + ), + #( + "log_path", + json.nullable(from: verification_log, of: json.string), + ), + ]), + ), + #( + "artifacts", + json.object([ + #("prompt_paths", json.array(task_prompt_paths(run.run_path, task.id), json.string)), + #("log_paths", json.array(task_log_paths(run.run_path, task.id), json.string)), + #("raw_payload_paths", json.array(raw_payload_paths(run.run_path, task.id), json.string)), + #( + "sanitized_payload_paths", + json.array(sanitized_payload_paths(run.run_path, task.id), json.string), + ), + ]), + ), + #("event_refs", json.array(relevant_events, event_ref_json)), + ]) +} + +fn review_state_json( + run: types.RunRecord, + repo_state_view: Option(repo_state_runtime.RepoStateView), +) -> Option(json.Json) { + case run.repo_state_snapshot { + None -> None + Some(snapshot) -> + Some(json.object([ + #("snapshot_captured_at", json.string(snapshot.captured_at)), + #("captured_open_pr_count", json.int(list.length(snapshot.open_pull_requests))), + #( + "captured_actionable_pr_count", + json.int( + snapshot.open_pull_requests + |> list.filter(fn(pr) { pr.actionable }) + |> list.length, + ), + ), + #( + "drift", + json.string(case repo_state_view { + Some(view) -> repo_state_runtime.drift_label(view.drift) + None -> "unknown" + }), + ), + ])) + } +} + +fn render_task_sections( + run: types.RunRecord, + events: List(types.RunEvent), + verification_commands: List(String), +) -> String { + case run.tasks { + [] -> "- No tasks matched the provenance request." + _ -> + run.tasks + |> list.map(fn(task) { + let relevant_events = + events + |> list.filter(fn(event) { event.task_id == Some(task.id) }) + [ + "- " + <> task.id + <> " (" + <> types.task_state_to_string(task.state) + <> ")", + " Branch: " <> render_empty_as_dash(task.branch_name), + " PR: " <> render_empty_as_dash(task.pr_number), + " Worktree: " <> render_empty_as_dash(task.worktree_path), + " Files touched: " <> render_string_list(parse_changed_files(task.summary)), + " Verification commands: " <> render_string_list(verification_commands), + " Verification outcome: " <> verification_outcome(task, relevant_events), + " Prompt artifacts: " <> render_string_list(task_prompt_paths(run.run_path, task.id)), + " Log artifacts: " <> render_string_list(task_log_paths(run.run_path, task.id)), + " Raw payloads: " <> render_string_list(raw_payload_paths(run.run_path, task.id)), + " Sanitized payloads: " + <> render_string_list(sanitized_payload_paths(run.run_path, task.id)), + " Event refs: " + <> render_string_list( + relevant_events |> list.map(render_event_ref_label), + ), + ] + |> string.join(with: "\n") + }) + |> string.join(with: "\n") + } +} + +fn render_event_refs(events: List(types.RunEvent)) -> String { + case events { + [] -> "- No events recorded yet." + _ -> + events + |> list.map(fn(event) { + "- " + <> render_event_ref_label(event) + <> " " + <> string.replace(in: event.message, each: "\n", with: " ") + }) + |> string.join(with: "\n") + } +} + +fn render_review_state_markdown( + run: types.RunRecord, + repo_state_view: Option(repo_state_runtime.RepoStateView), +) -> String { + case review_state_json(run, repo_state_view) { + Some(_) -> + "## Review State\n" + <> "- Snapshot captured: " + <> case run.repo_state_snapshot { + Some(snapshot) -> snapshot.captured_at + None -> "—" + } + <> "\n- Drift: " + <> case repo_state_view { + Some(view) -> repo_state_runtime.drift_label(view.drift) + None -> "unknown" + } + None -> "" + } +} + +fn filter_tasks( + tasks: List(types.Task), + task_filter: Option(String), +) -> Result(List(types.Task), String) { + case task_filter { + None -> Ok(tasks) + Some(task_id) -> + case list.filter(tasks, fn(task) { task.id == task_id }) { + [] -> Error("No task matched provenance filter `" <> task_id <> "`.") + filtered -> Ok(filtered) + } + } +} + +fn filter_events( + events: List(types.RunEvent), + task_filter: Option(String), +) -> List(types.RunEvent) { + case task_filter { + None -> events + Some(task_id) -> + events + |> list.filter(fn(event) { + event.task_id == Some(task_id) || event.task_id == None + }) + } +} + +fn load_verification_commands(repo_root: String) -> List(String) { + case config.load(project.config_path(repo_root)) { + Ok(loaded_config) -> loaded_config.verification_commands + Error(_) -> [] + } +} + +fn planning_artifact_paths( + run: types.RunRecord, + events: List(types.RunEvent), +) -> List(String) { + let event_paths = + events + |> list.filter(fn(event) { event.kind == "planning_artifacts_recorded" }) + |> list.map(fn(event) { event.message }) + |> list.filter_map(extract_path_from_event) + + let candidate_paths = case run.notes_source { + Some(types.InlineNotes(path)) -> [path, ..event_paths] + _ -> event_paths + } + + candidate_paths + |> list.filter(file_or_directory_exists) +} + +fn extract_path_from_event(message: String) -> Result(String, Nil) { + case string.split_once(message, "Planning artifacts: ") { + Ok(#(_, path)) -> Ok(string.trim(path)) + Error(_) -> Error(Nil) + } +} + +fn planner_prompt_path(run_path: String) -> Option(String) { + existing_file(filepath.join(run_path, "planner.prompt.md")) +} + +fn planner_log_path(run_path: String) -> Option(String) { + existing_file(filepath.join(run_path, "logs/planner.log")) +} + +fn task_prompt_paths(run_path: String, task_id: String) -> List(String) { + [ + filepath.join(run_path, "logs/" <> task_id <> ".prompt.md"), + filepath.join(run_path, "logs/" <> task_id <> ".repair.prompt.md"), + filepath.join(run_path, "logs/" <> task_id <> ".payload-repair.prompt.md"), + ] + |> existing_files +} + +fn task_log_paths(run_path: String, task_id: String) -> List(String) { + [ + execution_log_path(run_path, task_id), + repair_log_path(run_path, task_id), + payload_repair_log_path(run_path, task_id), + verification_log_path(run_path, task_id), + filepath.join(run_path, "logs/" <> task_id <> ".git.log"), + filepath.join(run_path, "logs/" <> task_id <> ".env.log"), + ] + |> existing_files +} + +fn raw_payload_paths(run_path: String, task_id: String) -> List(String) { + [ + filepath.join(run_path, "logs/" <> task_id <> ".result.raw.jsonish"), + filepath.join(run_path, "logs/" <> task_id <> ".payload-repair.result.raw.jsonish"), + ] + |> existing_files +} + +fn sanitized_payload_paths(run_path: String, task_id: String) -> List(String) { + [ + filepath.join(run_path, "logs/" <> task_id <> ".result.sanitized.json"), + filepath.join(run_path, "logs/" <> task_id <> ".payload-repair.result.sanitized.json"), + ] + |> existing_files +} + +fn verification_log_path(run_path: String, task_id: String) -> String { + filepath.join(run_path, "logs/" <> task_id <> ".verify.log") +} + +fn execution_log_path(run_path: String, task_id: String) -> String { + filepath.join(run_path, "logs/" <> task_id <> ".log") +} + +fn repair_log_path(run_path: String, task_id: String) -> String { + filepath.join(run_path, "logs/" <> task_id <> ".repair.log") +} + +fn payload_repair_log_path(run_path: String, task_id: String) -> String { + filepath.join(run_path, "logs/" <> task_id <> ".payload-repair.log") +} + +fn verification_outcome( + task: types.Task, + events: List(types.RunEvent), +) -> String { + case list.any(events, fn(event) { event.kind == "task_verified" }) { + True -> "passed" + False -> + case task.state { + types.Failed -> + case string.contains(does: task.summary, contain: "verification failed") { + True -> "failed" + False -> "not_recorded" + } + _ -> "not_recorded" + } + } +} + +fn parse_changed_files(summary: String) -> List(String) { + case string.split_once(summary, " Changed files: ") { + Ok(#(_, changed_files)) -> + changed_files + |> string.split(",") + |> list.filter_map(fn(entry) { + case string.trim(entry) { + "" -> Error(Nil) + path -> Ok(path) + } + }) + Error(_) -> [] + } +} + +fn event_ref_json(event: types.RunEvent) -> json.Json { + json.object([ + #("event_id", json.string(event_id(event))), + #("kind", json.string(event.kind)), + #("at", json.string(event.at)), + #("task_id", case event.task_id { + Some(task_id) -> json.string(task_id) + None -> json.null() + }), + #("message", json.string(event.message)), + ]) +} + +fn render_event_ref_label(event: types.RunEvent) -> String { + event_id(event) <> "@" <> event.at +} + +fn event_id(event: types.RunEvent) -> String { + case event.task_id { + Some(task_id) -> event.kind <> ":" <> task_id + None -> event.kind <> ":run" + } +} + +fn agent_json(agent: types.ResolvedAgentConfig) -> json.Json { + json.object([ + #("profile_name", json.string(agent.profile_name)), + #("provider", json.string(types.provider_to_string(agent.provider))), + #("model", case agent.model { + Some(model) -> json.string(model) + None -> json.null() + }), + #("reasoning", case agent.reasoning { + Some(reasoning) -> json.string(types.reasoning_to_string(reasoning)) + None -> json.null() + }), + ]) +} + +fn render_planning_provenance( + provenance: Option(types.PlanningProvenance), +) -> String { + case provenance { + Some(value) -> types.planning_provenance_label(value) + None -> "(legacy)" + } +} + +fn render_notes_source(notes_source: Option(types.NotesSource)) -> String { + case notes_source { + Some(source) -> types.notes_source_label(source) + None -> "(none)" + } +} + +fn render_string_list(values: List(String)) -> String { + case values { + [] -> "none" + _ -> string.join(values, with: ", ") + } +} + +fn render_optional_path(path: Option(String)) -> String { + case path { + Some(value) -> value + None -> "none" + } +} + +fn render_empty_as_dash(value: String) -> String { + case string.trim(value) { + "" -> "—" + _ -> value + } +} + +fn identity_json(value: json.Json) -> json.Json { + value +} + +fn existing_files(paths: List(String)) -> List(String) { + paths + |> list.filter(file_exists) +} + +fn existing_file(path: String) -> Option(String) { + case file_exists(path) { + True -> Some(path) + False -> None + } +} + +fn file_exists(path: String) -> Bool { + case simplifile.read(path) { + Ok(_) -> True + Error(_) -> False + } +} + +fn file_or_directory_exists(path: String) -> Bool { + file_exists(path) + || case simplifile.read_directory(at: path) { + Ok(_) -> True + Error(_) -> False + } +} + +fn write_file(path: String, contents: String) -> Result(Nil, String) { + case simplifile.write(contents, to: path) { + Ok(Nil) -> Ok(Nil) + Error(error) -> + Error( + "Unable to write " <> path <> ": " <> simplifile.describe_error(error), + ) + } +} diff --git a/src/night_shift/domain/report.gleam b/src/night_shift/domain/report.gleam index 8b0b731..f710c98 100644 --- a/src/night_shift/domain/report.gleam +++ b/src/night_shift/domain/report.gleam @@ -3,6 +3,8 @@ import gleam/list import gleam/option.{type Option, None, Some} import gleam/string import night_shift/agent_config +import night_shift/domain/confidence +import night_shift/domain/provenance import night_shift/domain/repo_state import night_shift/domain/review_run_projection import night_shift/repo_state_runtime @@ -13,6 +15,7 @@ pub fn render( events: List(types.RunEvent), repo_state_view: Option(repo_state_runtime.RepoStateView), ) -> String { + let confidence_assessment = confidence.assess(run, events, repo_state_view) [ "# Night Shift Report", "", @@ -28,9 +31,14 @@ pub fn render( "- Created at: " <> run.created_at, "- Updated at: " <> run.updated_at, "- Brief: " <> run.brief_path, + "- Provenance: " <> provenance.artifact_path(run), render_repo_state_section(run, repo_state_view), "", "## Summary", + "- Confidence posture: " + <> types.confidence_posture_to_string(confidence_assessment.posture), + "- Confidence reasons: " + <> confidence.reasons_summary(confidence_assessment), render_summary(run.decisions, run.planning_dirty, run.tasks, events), render_planning_validation_summary(events), render_failure_summary(run, events), diff --git a/src/night_shift/infra/run_store.gleam b/src/night_shift/infra/run_store.gleam index 976b464..de10a7e 100644 --- a/src/night_shift/infra/run_store.gleam +++ b/src/night_shift/infra/run_store.gleam @@ -5,6 +5,7 @@ import gleam/result import gleam/string import night_shift/codec/artifact_path import night_shift/codec/journal as journal_codec +import night_shift/domain/provenance import night_shift/domain/repo_state import night_shift/project import night_shift/report @@ -198,7 +199,8 @@ pub fn save( journal_codec.encode_run(run), )) use _ <- result.try(write_events(run.events_path, events)) - write_string(run.report_path, report.render_persisted(run, events)) + use _ <- result.try(write_string(run.report_path, report.render_persisted(run, events))) + provenance.write_persisted(run, events) } pub fn append_event( diff --git a/src/night_shift/types.gleam b/src/night_shift/types.gleam index 2803b47..217f149 100644 --- a/src/night_shift/types.gleam +++ b/src/night_shift/types.gleam @@ -465,6 +465,47 @@ pub type RunSelector { RunId(String) } +pub type ConfidencePosture { + ConfidenceHigh + ConfidenceGuarded + ConfidenceLow +} + +pub fn confidence_posture_to_string(posture: ConfidencePosture) -> String { + case posture { + ConfidenceHigh -> "high" + ConfidenceGuarded -> "guarded" + ConfidenceLow -> "low" + } +} + +pub type ConfidenceAssessment { + ConfidenceAssessment(posture: ConfidencePosture, reasons: List(String)) +} + +pub type RecoveryClassification { + SafeToResume + ResumeWithWarning + RecoveryManualAttention + RecoveryIrrecoverable +} + +pub fn recovery_classification_to_string( + classification: RecoveryClassification, +) -> String { + case classification { + SafeToResume -> "safe_to_resume" + ResumeWithWarning -> "resume_with_warning" + RecoveryManualAttention -> "manual_attention" + RecoveryIrrecoverable -> "irrecoverable" + } +} + +pub type ProvenanceFormat { + ProvenanceJson + ProvenanceMarkdown +} + /// Repo-local operator configuration for Night Shift. pub type Config { Config( @@ -512,8 +553,14 @@ pub type Command { ) Status(run: RunSelector) Report(run: RunSelector) + Provenance( + run: RunSelector, + task_id: Option(String), + format: ProvenanceFormat, + ) + Doctor(run: RunSelector) Resolve(run: RunSelector) - Resume(run: RunSelector, ui_enabled: Bool) + Resume(run: RunSelector, ui_enabled: Bool, explain_only: Bool) Demo(ui_enabled: Bool) Help } diff --git a/src/night_shift/usecase/doctor.gleam b/src/night_shift/usecase/doctor.gleam new file mode 100644 index 0000000..8c02acb --- /dev/null +++ b/src/night_shift/usecase/doctor.gleam @@ -0,0 +1,361 @@ +import filepath +import gleam/list +import gleam/option.{type Option, None, Some} +import gleam/result +import gleam/string +import night_shift/git +import night_shift/journal +import night_shift/project +import night_shift/repo_state_runtime +import night_shift/types +import simplifile + +pub fn execute( + repo_root: String, + selector: types.RunSelector, + config: types.Config, +) -> Result(String, String) { + use #(run, events) <- result.try(journal.load(repo_root, selector)) + let repo_state_view = repo_state_runtime.inspect(run, config.branch_prefix).view + let active_lock = active_lock_state(repo_root, run.run_id) + let assessments = + run.tasks |> list.map(diagnose_task(repo_root, run.run_path, _)) + let recommendation = + recommend_next_action(run.status, events, active_lock, assessments) + + Ok(render_doctor(run, repo_state_view, active_lock, recommendation, assessments)) +} + +type ActiveLockState { + ActiveLockMissing + ActiveLockMatched + ActiveLockMismatch(run_id: String) +} + +type TaskAssessment { + TaskAssessment( + task: types.Task, + classification: types.RecoveryClassification, + reasons: List(String), + ) +} + +fn render_doctor( + run: types.RunRecord, + repo_state_view: Option(repo_state_runtime.RepoStateView), + active_lock: ActiveLockState, + recommendation: String, + assessments: List(TaskAssessment), +) -> String { + [ + "# Night Shift Recovery Doctor", + "", + "## Run", + "- Run ID: " <> run.run_id, + "- Status: " <> types.run_status_to_string(run.status), + "- Active lock: " <> active_lock_label(active_lock), + "- Recommendation: " <> recommendation, + case repo_state_view { + Some(view) -> + "- Review drift: " <> repo_state_runtime.drift_label(view.drift) + None -> "" + }, + "", + "## Task Assessments", + render_task_assessments(assessments), + ] + |> list.filter(fn(line) { line != "" }) + |> string.join(with: "\n") +} + +fn render_task_assessments(assessments: List(TaskAssessment)) -> String { + case assessments { + [] -> "- No tasks are recorded for this run." + _ -> + assessments + |> list.map(fn(assessment) { + "- " + <> assessment.task.id + <> " [" + <> types.recovery_classification_to_string(assessment.classification) + <> "] " + <> assessment.task.title + <> "\n " + <> string.join(assessment.reasons, with: "\n ") + }) + |> string.join(with: "\n") + } +} + +fn active_lock_state(repo_root: String, run_id: String) -> ActiveLockState { + case simplifile.read(project.active_lock_path(repo_root)) { + Ok(contents) -> + case string.trim(contents) { + value if value == run_id -> ActiveLockMatched + value -> ActiveLockMismatch(value) + } + Error(_) -> ActiveLockMissing + } +} + +fn active_lock_label(state: ActiveLockState) -> String { + case state { + ActiveLockMatched -> "matched" + ActiveLockMissing -> "missing" + ActiveLockMismatch(run_id) -> "points at " <> run_id + } +} + +fn diagnose_task( + repo_root: String, + run_path: String, + task: types.Task, +) -> TaskAssessment { + let git_log = filepath.join(run_path, "logs/" <> task.id <> ".doctor.git.log") + let execution_log = filepath.join(run_path, "logs/" <> task.id <> ".log") + let worktree_exists = + case task.worktree_path { + "" -> False + path -> directory_exists(path) + } + let mounted_worktree = case task.branch_name { + "" -> Ok(None) + _ -> git.mounted_worktree_path(repo_root, task.branch_name, git_log) + } + + case task.state { + types.Completed -> + TaskAssessment( + task: task, + classification: types.SafeToResume, + reasons: [ + "Task is already completed and does not need recovery work.", + ], + ) + types.Ready | types.Queued -> + TaskAssessment( + task: task, + classification: types.SafeToResume, + reasons: ["Task has not started yet; resume would schedule it normally."], + ) + types.Blocked | types.ManualAttention -> + TaskAssessment( + task: task, + classification: types.RecoveryManualAttention, + reasons: [ + "Task already requires operator attention before Night Shift can continue.", + ], + ) + types.Failed -> + TaskAssessment( + task: task, + classification: types.RecoveryManualAttention, + reasons: [ + "Task is already failed; inspect its report and logs before retrying.", + ], + ) + types.Running -> + diagnose_running_task( + task, + run_path, + execution_log, + worktree_exists, + mounted_worktree, + ) + } +} + +fn diagnose_running_task( + task: types.Task, + run_path: String, + execution_log: String, + worktree_exists: Bool, + mounted_worktree: Result(Option(String), String), +) -> TaskAssessment { + case task.worktree_path { + "" -> + TaskAssessment( + task: task, + classification: types.RecoveryIrrecoverable, + reasons: [ + "Task was running, but no worktree path was recorded.", + ], + ) + _ -> + case worktree_exists { + False -> + TaskAssessment( + task: task, + classification: types.RecoveryIrrecoverable, + reasons: [ + "Recorded worktree path no longer exists on disk.", + ], + ) + True -> { + let doctor_git_log = + filepath.join(run_path, "logs/" <> task.id <> ".doctor.has-changes.log") + case git.has_changes( + task.worktree_path, + doctor_git_log, + ) { + True -> + TaskAssessment( + task: task, + classification: types.RecoveryManualAttention, + reasons: [ + "Worktree has uncommitted changes; `resume` would convert this task into manual attention.", + ], + ) + False -> + diagnose_clean_running_task(task, execution_log, mounted_worktree) + } + } + } + } +} + +fn diagnose_clean_running_task( + task: types.Task, + execution_log: String, + mounted_worktree: Result(Option(String), String), +) -> TaskAssessment { + case mounted_worktree { + Error(message) -> + TaskAssessment( + task: task, + classification: types.ResumeWithWarning, + reasons: [ + "Night Shift could not confirm the mounted worktree for this branch.", + message, + ], + ) + Ok(Some(mounted_path)) -> + case mounted_path == task.worktree_path, file_exists(execution_log) { + False, _ -> + TaskAssessment( + task: task, + classification: types.ResumeWithWarning, + reasons: [ + "Branch is mounted at a different path than the run journal recorded.", + "Recorded path: " <> task.worktree_path, + "Mounted path: " <> mounted_path, + ], + ) + True, False -> + TaskAssessment( + task: task, + classification: types.ResumeWithWarning, + reasons: [ + "Execution log is missing, so recovery evidence is incomplete.", + "Expected log: " <> execution_log, + ], + ) + True, True -> + TaskAssessment( + task: task, + classification: types.SafeToResume, + reasons: [ + "Worktree is mounted, clean, and matches the recorded branch.", + "`resume` should requeue this interrupted task safely.", + ], + ) + } + Ok(None) -> + TaskAssessment( + task: task, + classification: types.ResumeWithWarning, + reasons: [ + "Branch is not mounted in git worktree metadata; Night Shift may need to reattach it during recovery.", + ], + ) + } +} + +fn recommend_next_action( + status: types.RunStatus, + events: List(types.RunEvent), + active_lock: ActiveLockState, + assessments: List(TaskAssessment), +) -> String { + case latest_environment_preflight_failure(events) { + Some(_) -> + "Fix the worktree environment first, then rerun `night-shift start` instead of resuming blindly." + None -> + case status { + types.RunCompleted -> + "This run is already completed; inspect the report and retained worktrees instead of resuming." + _ -> + case has_classification(assessments, types.RecoveryIrrecoverable) { + True -> + "At least one task is irrecoverable from saved state; inspect the journal and replan rather than resuming." + False -> + case has_classification( + assessments, + types.RecoveryManualAttention, + ) { + True -> + "Resolve the manual-attention tasks first; `resume` would not safely clear them." + False -> + case active_lock { + ActiveLockMismatch(other_run_id) -> + "Another run lock is active (" + <> other_run_id + <> "); clear that ambiguity before resuming." + _ -> + case has_classification( + assessments, + types.ResumeWithWarning, + ) { + True -> + "Resume is possible, but review the warnings above before you let Night Shift continue." + False -> + "Resume should be safe from the saved run state." + } + } + } + } + } + } +} + +fn has_classification( + assessments: List(TaskAssessment), + target: types.RecoveryClassification, +) -> Bool { + list.any(assessments, fn(assessment) { + assessment.classification == target + }) +} + +fn latest_environment_preflight_failure( + events: List(types.RunEvent), +) -> Option(String) { + latest_environment_preflight_failure_loop(list.reverse(events)) +} + +fn latest_environment_preflight_failure_loop( + events: List(types.RunEvent), +) -> Option(String) { + case events { + [] -> None + [event, ..rest] -> + case event.kind == "environment_preflight_failed" { + True -> Some(event.message) + False -> latest_environment_preflight_failure_loop(rest) + } + } +} + +fn directory_exists(path: String) -> Bool { + case simplifile.read_directory(at: path) { + Ok(_) -> True + Error(_) -> False + } +} + +fn file_exists(path: String) -> Bool { + case simplifile.read(path) { + Ok(_) -> True + Error(_) -> False + } +} diff --git a/src/night_shift/usecase/plan.gleam b/src/night_shift/usecase/plan.gleam index 746e9c7..76aecfb 100644 --- a/src/night_shift/usecase/plan.gleam +++ b/src/night_shift/usecase/plan.gleam @@ -4,6 +4,7 @@ import gleam/result import night_shift/agent_config import night_shift/domain/repo_state import night_shift/github +import night_shift/journal import night_shift/orchestrator import night_shift/project import night_shift/provider @@ -67,13 +68,22 @@ pub fn execute( True -> orchestrator.replan(seeded_run) False -> orchestrator.plan(seeded_run) }) + use recorded_run <- result.try(journal.append_event( + planned_run, + types.RunEvent( + kind: "planning_artifacts_recorded", + at: system.timestamp(), + message: "Planning artifacts: " <> artifact_path, + task_id: None, + ), + )) Ok(workflow.PlanResult( - run: planned_run, + run: recorded_run, brief_path: target_doc_path, artifact_path: artifact_path, planning_provenance: planning_provenance, warnings: config_warnings(config), - next_action: runs.next_action_for_run(planned_run), + next_action: runs.next_action_for_run(recorded_run), )) } diff --git a/src/night_shift/usecase/provenance.gleam b/src/night_shift/usecase/provenance.gleam new file mode 100644 index 0000000..1e01f50 --- /dev/null +++ b/src/night_shift/usecase/provenance.gleam @@ -0,0 +1,25 @@ +import gleam/result +import gleam/option.{type Option} +import night_shift/domain/provenance as provenance_domain +import night_shift/journal +import night_shift/repo_state_runtime +import night_shift/types + +pub fn execute( + repo_root: String, + selector: types.RunSelector, + task_id: Option(String), + format: types.ProvenanceFormat, + config: types.Config, +) -> Result(String, String) { + use #(run, events) <- result.try(journal.load(repo_root, selector)) + let repo_state_view = repo_state_runtime.inspect(run, config.branch_prefix).view + provenance_domain.render( + run, + events, + repo_state_view, + task_id, + format, + config.verification_commands, + ) +} diff --git a/src/night_shift/usecase/render.gleam b/src/night_shift/usecase/render.gleam index 1e5ebd5..ed73b20 100644 --- a/src/night_shift/usecase/render.gleam +++ b/src/night_shift/usecase/render.gleam @@ -3,6 +3,7 @@ import gleam/list import gleam/option.{type Option, None, Some} import gleam/string import night_shift/agent_config +import night_shift/domain/confidence import night_shift/domain/decisions as decision_domain import night_shift/domain/review_run_projection import night_shift/repo_state_runtime @@ -60,11 +61,18 @@ pub fn render_status(view: result.StatusResult) -> String { <> render_notes_source(view.run.notes_source) <> render_repo_state_fragment(view.run, view.repo_state_view) <> "\n" + <> "Confidence: " + <> types.confidence_posture_to_string(view.confidence.posture) + <> "\nConfidence reasons: " + <> confidence.reasons_summary(view.confidence) + <> "\n" <> view.summary <> "\nEvents: " <> int.to_string(list.length(view.events)) <> "\nReport: " <> view.run.report_path + <> "\nProvenance: " + <> view.provenance_path } pub fn render_resolve(view: result.ResolveResult) -> String { diff --git a/src/night_shift/usecase/result.gleam b/src/night_shift/usecase/result.gleam index 875c853..7bb5cbc 100644 --- a/src/night_shift/usecase/result.gleam +++ b/src/night_shift/usecase/result.gleam @@ -27,6 +27,8 @@ pub type StatusResult { run: types.RunRecord, events: List(types.RunEvent), repo_state_view: Option(repo_state_runtime.RepoStateView), + confidence: types.ConfidenceAssessment, + provenance_path: String, summary: String, next_action: String, ) diff --git a/src/night_shift/usecase/status.gleam b/src/night_shift/usecase/status.gleam index 2786a50..ab7c82e 100644 --- a/src/night_shift/usecase/status.gleam +++ b/src/night_shift/usecase/status.gleam @@ -1,4 +1,6 @@ import gleam/result +import night_shift/domain/confidence +import night_shift/domain/provenance import night_shift/domain/status import night_shift/repo_state_runtime import night_shift/types @@ -13,10 +15,14 @@ pub fn execute( use #(run, events) <- result.try(runs.load_display_run(repo_root, selector)) let next_action = runs.next_action_for_run(run) let inspection = repo_state_runtime.inspect(run, config.branch_prefix) + let confidence_assessment = + confidence.assess(run, events, inspection.view) Ok(workflow.StatusResult( run: run, events: events, repo_state_view: inspection.view, + confidence: confidence_assessment, + provenance_path: provenance.artifact_path(run), summary: status.summary(run, events, next_action), next_action: next_action, )) diff --git a/test/night_shift_cli_config_test.gleam b/test/night_shift_cli_config_test.gleam index e6cd2e9..d009e87 100644 --- a/test/night_shift_cli_config_test.gleam +++ b/test/night_shift_cli_config_test.gleam @@ -85,10 +85,29 @@ pub fn parse_resolve_defaults_to_latest_test() { } pub fn parse_resume_command_with_ui_test() { - let assert Ok(types.Resume(types.RunId("run-123"), True)) = + let assert Ok(types.Resume(types.RunId("run-123"), True, False)) = cli.parse(["resume", "--run", "run-123", "--ui"]) } +pub fn parse_resume_explain_command_test() { + let assert Ok(types.Resume(types.LatestRun, False, True)) = + cli.parse(["resume", "--explain"]) +} + +pub fn parse_doctor_command_test() { + let assert Ok(types.Doctor(types.RunId("run-123"))) = + cli.parse(["doctor", "--run", "run-123"]) +} + +pub fn parse_provenance_command_test() { + let assert Ok(types.Provenance( + types.LatestRun, + Some("task-1"), + types.ProvenanceJson, + )) = + cli.parse(["provenance", "--task", "task-1", "--format", "json"]) +} + pub fn parse_resume_rejects_environment_flag_test() { let assert Error(message) = cli.parse(["resume", "--environment", "dev"]) assert message == "Unsupported flag: --environment" diff --git a/test/trust_surface_test.gleam b/test/trust_surface_test.gleam new file mode 100644 index 0000000..8b34568 --- /dev/null +++ b/test/trust_surface_test.gleam @@ -0,0 +1,310 @@ +import filepath +import gleam/option.{None, Some} +import gleam/string +import night_shift/dashboard +import night_shift/domain/provenance as provenance_domain +import night_shift/domain/repo_state +import night_shift/git +import night_shift/journal +import night_shift/report +import night_shift/repo_state_runtime +import night_shift/shell +import night_shift/system +import night_shift/types +import night_shift/usecase/doctor +import night_shift_test_support as support +import simplifile + +pub fn persisted_run_writes_provenance_artifact_test() { + let unique = system.unique_id() + let base_dir = + support.absolute_path(filepath.join( + system.state_directory(), + "night-shift-provenance-" <> unique, + )) + let repo_root = filepath.join(base_dir, "repo") + let brief_path = filepath.join(base_dir, "brief.md") + + let _ = simplifile.delete(file_or_dir_at: base_dir) + let assert Ok(_) = simplifile.create_directory_all(base_dir) + let assert Ok(_) = simplifile.write("# Brief", to: brief_path) + let assert Ok(run) = support.start_run(repo_root, brief_path, types.Codex, 1) + let assert Ok(provenance_contents) = + simplifile.read(filepath.join(run.run_path, "provenance.json")) + + assert string.contains(does: provenance_contents, contain: "\"run_id\"") + assert string.contains( + does: provenance_contents, + contain: "\"provenance_path\":\"" + <> filepath.join(run.run_path, "provenance.json") + <> "\"", + ) + + let _ = simplifile.delete(file_or_dir_at: base_dir) +} + +pub fn report_includes_confidence_and_provenance_test() { + let rendered = report.render_live(review_run(), [], Some(review_state_view())) + + assert string.contains(does: rendered, contain: "Confidence posture:") + assert string.contains(does: rendered, contain: "Confidence reasons:") + assert string.contains(does: rendered, contain: "Provenance: /tmp/repo/.night-shift/runs/review-run/provenance.json") +} + +pub fn provenance_render_includes_review_drift_test() { + let assert Ok(rendered) = + provenance_domain.render( + review_run(), + [], + Some(review_state_view()), + None, + types.ProvenanceJson, + [], + ) + + assert string.contains(does: rendered, contain: "\"review_state\"") + assert string.contains(does: rendered, contain: "\"drift\":\"yes\"") + assert string.contains( + does: rendered, + contain: "\"superseded_pr_numbers\":[12]", + ) +} + +pub fn dashboard_payload_includes_confidence_and_provenance_test() { + let unique = system.unique_id() + let base_dir = + support.absolute_path(filepath.join( + system.state_directory(), + "night-shift-dashboard-trust-" <> unique, + )) + let repo_root = filepath.join(base_dir, "repo") + let brief_path = filepath.join(base_dir, "brief.md") + + let _ = simplifile.delete(file_or_dir_at: base_dir) + let assert Ok(_) = simplifile.create_directory_all(base_dir) + let assert Ok(_) = simplifile.write("# Brief", to: brief_path) + let assert Ok(run) = support.start_run(repo_root, brief_path, types.Codex, 1) + let assert Ok(updated_run) = + journal.append_event( + run, + types.RunEvent( + kind: "execution_payload_warning", + at: system.timestamp(), + message: "Accepted a recovered execution payload.", + task_id: Some("demo-task"), + ), + ) + let assert Ok(run_payload) = dashboard.run_json(repo_root, updated_run.run_id) + + assert string.contains(does: run_payload, contain: "\"confidence_posture\"") + assert string.contains(does: run_payload, contain: "\"provenance_path\"") + + let _ = simplifile.delete(file_or_dir_at: base_dir) +} + +pub fn doctor_flags_dirty_and_missing_worktrees_test() { + let unique = system.unique_id() + let base_dir = + support.absolute_path(filepath.join( + system.state_directory(), + "night-shift-doctor-" <> unique, + )) + let repo_root = filepath.join(base_dir, "repo") + let brief_path = filepath.join(base_dir, "brief.md") + let missing_worktree = filepath.join(base_dir, "missing-worktree") + + let _ = simplifile.delete(file_or_dir_at: base_dir) + let assert Ok(_) = simplifile.create_directory_all(repo_root) + let assert Ok(_) = simplifile.write("# Brief", to: brief_path) + support.seed_git_repo(repo_root, base_dir) + let assert Ok(run) = support.start_run(repo_root, brief_path, types.Codex, 1) + let assert Ok(_) = + simplifile.write("dirty\n", to: filepath.join(repo_root, "DIRTY.md")) + let updated_run = + types.RunRecord( + ..run, + tasks: [ + types.Task( + id: "dirty-task", + title: "Dirty task", + description: "", + dependencies: [], + acceptance: [], + demo_plan: [], + decision_requests: [], + superseded_pr_numbers: [], + kind: types.ImplementationTask, + execution_mode: types.Serial, + state: types.Running, + worktree_path: repo_root, + branch_name: "night-shift/dirty-task", + pr_number: "", + summary: "", + ), + types.Task( + id: "missing-task", + title: "Missing task", + description: "", + dependencies: [], + acceptance: [], + demo_plan: [], + decision_requests: [], + superseded_pr_numbers: [], + kind: types.ImplementationTask, + execution_mode: types.Serial, + state: types.Running, + worktree_path: missing_worktree, + branch_name: "night-shift/missing-task", + pr_number: "", + summary: "", + ), + ], + ) + let assert Ok(_) = journal.rewrite_run(updated_run) + let assert Ok(rendered) = + doctor.execute(repo_root, types.LatestRun, types.default_config()) + + assert string.contains(does: rendered, contain: "[manual_attention] Dirty task") + assert string.contains(does: rendered, contain: "[irrecoverable] Missing task") + + let _ = simplifile.delete(file_or_dir_at: base_dir) +} + +pub fn doctor_does_not_write_probe_log_into_worktree_test() { + let unique = system.unique_id() + let base_dir = + support.absolute_path(filepath.join( + system.state_directory(), + "night-shift-doctor-clean-" <> unique, + )) + let repo_root = filepath.join(base_dir, "repo") + let brief_path = filepath.join(base_dir, "brief.md") + let worktree_path = filepath.join(base_dir, "clean-worktree") + let probe_path = filepath.join(worktree_path, ".night-shift-doctor.log") + let git_log = filepath.join(base_dir, "worktree-add.log") + + let _ = simplifile.delete(file_or_dir_at: base_dir) + let assert Ok(_) = simplifile.create_directory_all(repo_root) + let assert Ok(_) = simplifile.write("# Brief", to: brief_path) + support.seed_git_repo(repo_root, base_dir) + let assert Ok(_) = + git.create_worktree( + repo_root, + worktree_path, + "night-shift/clean-task", + "main", + git_log, + ) + let assert Ok(run) = support.start_run(repo_root, brief_path, types.Codex, 1) + let updated_run = + types.RunRecord( + ..run, + tasks: [ + types.Task( + id: "clean-task", + title: "Clean task", + description: "", + dependencies: [], + acceptance: [], + demo_plan: [], + decision_requests: [], + superseded_pr_numbers: [], + kind: types.ImplementationTask, + execution_mode: types.Serial, + state: types.Running, + worktree_path: worktree_path, + branch_name: "night-shift/clean-task", + pr_number: "", + summary: "", + ), + ], + ) + let assert Ok(_) = journal.rewrite_run(updated_run) + let assert Ok(rendered) = + doctor.execute(repo_root, types.LatestRun, types.default_config()) + + assert string.contains(does: rendered, contain: "[resume_with_warning] Clean task") + let assert Error(_) = simplifile.read(probe_path) + + let _ = + shell.run( + "git worktree remove --force " <> shell.quote(worktree_path), + repo_root, + filepath.join(base_dir, "worktree-remove.log"), + ) + let _ = simplifile.delete(file_or_dir_at: base_dir) +} + +fn review_run() -> types.RunRecord { + types.RunRecord( + run_id: "review-run", + repo_root: "/tmp/repo", + run_path: "/tmp/repo/.night-shift/runs/review-run", + brief_path: "/tmp/repo/.night-shift/runs/review-run/brief.md", + state_path: "/tmp/repo/.night-shift/runs/review-run/state.json", + events_path: "/tmp/repo/.night-shift/runs/review-run/events.jsonl", + report_path: "/tmp/repo/.night-shift/runs/review-run/report.md", + lock_path: "/tmp/repo/.night-shift/active.lock", + planning_agent: types.resolved_agent_from_provider(types.Codex), + execution_agent: types.resolved_agent_from_provider(types.Codex), + environment_name: "default", + max_workers: 1, + notes_source: None, + planning_provenance: Some(types.ReviewsOnly), + repo_state_snapshot: Some(repo_state_snapshot()), + decisions: [], + planning_dirty: False, + status: types.RunCompleted, + created_at: "2026-04-13T17:30:00Z", + updated_at: "2026-04-13T18:02:00Z", + tasks: [ + types.Task( + id: "rewrite-root", + title: "rewrite-root", + description: "", + dependencies: [], + acceptance: [], + demo_plan: [], + decision_requests: [], + superseded_pr_numbers: [12], + kind: types.ImplementationTask, + execution_mode: types.Serial, + state: types.Completed, + worktree_path: "/tmp/repo/.night-shift/runs/review-run/worktrees/rewrite-root", + branch_name: "night-shift/rewrite-root", + pr_number: "15", + summary: "Updated rewrite-root", + ), + ], + ) +} + +fn repo_state_snapshot() -> repo_state.RepoStateSnapshot { + repo_state.RepoStateSnapshot( + captured_at: "2026-04-13T17:30:00Z", + digest: "digest", + open_pull_requests: [ + repo_state.RepoPullRequestSnapshot( + number: 12, + title: "Root rewrite", + url: "https://example.test/pr/12", + head_ref_name: "night-shift/root", + base_ref_name: "main", + review_decision: "REVIEW_REQUIRED", + failing_checks: [], + review_comments: ["Please rewrite the root document."], + actionable: True, + impacted: True, + ), + ], + ) +} + +fn review_state_view() -> repo_state_runtime.RepoStateView { + repo_state_runtime.RepoStateView( + snapshot_captured_at: "2026-04-13T17:30:00Z", + open_pr_count: 2, + actionable_pr_count: 1, + drift: repo_state_runtime.RepoStateDrifted, + ) +}