Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 17 additions & 3 deletions .codex/skills/qa-night-shift/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: qa-night-shift
description: Use when the user wants to QA test Night Shift against a user-specified scratch repo path, install the current worktree CLI, and run an approval-gated real-provider pass to validate init/plan/start/status/report/resolve/resume behavior.
description: Use when the user wants to QA test Night Shift against a user-specified scratch repo path, install the current worktree CLI, and run an approval-gated real-provider pass to validate init/plan/start/status/report/provenance/doctor/resolve/resume behavior.
---

# QA Night Shift
Expand Down Expand Up @@ -59,7 +59,10 @@ real inference spend.
- If it does look like an intentional testing target, proceed.
- Even for an obvious scratch repo, do not run `night-shift plan`,
`night-shift start`, `night-shift resume`, or other inference-consuming QA
steps until the user approves the presented plan.
steps until the user approves the presented plan. Read-only checks such as
`night-shift status`, `night-shift report`, `night-shift provenance`,
`night-shift doctor`, or `night-shift resume --explain` are acceptable once
the user-approved QA pass reaches the relevant state.

Do not quietly assume a normal product repo is safe to use for QA.

Expand Down Expand Up @@ -123,7 +126,10 @@ Typical flow:
5. inspect `night-shift status`
6. run `night-shift start`
7. inspect `night-shift report`
8. use `night-shift resolve` or `night-shift resume` only if the run actually
8. inspect `night-shift provenance`
9. use `night-shift doctor` or `night-shift resume --explain` before any real
resume attempt when the run was interrupted
10. use `night-shift resolve` or `night-shift resume` only if the run actually
requires it

For review-driven investigations, replace steps 3-4 with:
Expand Down Expand Up @@ -167,6 +173,13 @@ In review-driven runs, pay attention to repo-state evidence:
manual attention
- whether `status` and `report` show payload-repair attempts, successes, and
failures with usable artifact paths
- whether `status`, `report`, and the dashboard agree on the confidence posture
and its reasons
- whether `provenance` records the expected prompt paths, payload artifacts,
verification evidence, worktree paths, and PR linkage
- whether `doctor` classifies interrupted tasks as `safe_to_resume`,
`resume_with_warning`, `manual_attention`, or `irrecoverable` for the actual
saved repo state

Use small tasks that validate the requested behavior instead of inviting large
feature work.
Expand All @@ -180,6 +193,7 @@ Collect evidence from:
- relevant CLI output
- the current report path printed by Night Shift
- run journal paths under `.night-shift/runs/`
- the `provenance.json` path and any task-specific artifact paths it surfaces
- relevant logs for the failing or surprising step
- PR or delivery results when they happen
- any verification output tied to the run
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,14 @@ night-shift plan --notes notes/today.md
night-shift start
night-shift status
night-shift report
night-shift provenance
```

Supporting commands round out the lifecycle:

- `resolve` records answers for blocked planning decisions and replans the run
- `doctor` explains whether a saved run is safe to resume and why
- `provenance` renders a per-run evidence ledger from saved artifacts
- `resume` recovers an interrupted run from saved state
- `plan --from-reviews` turns open Night Shift PR feedback into a fresh
successor stack
Expand Down Expand Up @@ -105,6 +108,7 @@ Inspect progress and outputs:
```sh
night-shift status
night-shift report
night-shift provenance
```

If planning blocked on manual decisions:
Expand All @@ -117,6 +121,8 @@ night-shift start
If Night Shift was interrupted mid-run:

```sh
night-shift doctor
night-shift resume --explain
night-shift resume
```

Expand Down
6 changes: 5 additions & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ If you are new to the project, start here:
- [Getting Started](getting-started.md) for install, prerequisites, and the
first runnable flow
- [Run Lifecycle](run-lifecycle.md) for how `plan`, `start`, `resolve`,
`resume`, `plan --from-reviews`, and `reset` fit together
`resume`, `doctor`, `provenance`, `plan --from-reviews`, and `reset` fit
together
- [Configuration](configuration.md) for `config.toml` profiles and override
precedence
- [Worktree Environments](worktree-environments.md) for
Expand All @@ -40,11 +41,14 @@ night-shift plan --notes notes/today.md
night-shift start
night-shift status
night-shift report
night-shift provenance
```

Supporting flows handle the messier parts of reality:

- `resolve` records answers for manual-attention tasks and replans in place
- `doctor` explains whether an interrupted run looks safe to resume
- `provenance` prints the run's evidence ledger
- `resume` reattaches to an interrupted run
- `plan --from-reviews` turns open Night Shift PR feedback into a fresh successor stack
- `reset` removes Night Shift state and tracked task worktrees, but does not touch local branches or remote PRs
Expand Down
13 changes: 10 additions & 3 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,11 +121,12 @@ Use these commands while a run is active or after it finishes:
```sh
night-shift status
night-shift report
night-shift provenance
```

`status` prints the current run state, planning and execution agent summaries,
notes source, event count, and report location. `report` prints the current
markdown report directly.
`status` prints the current run state, confidence posture, provenance path,
and report location. `report` prints the current markdown report directly, and
`provenance` prints the run's evidence ledger from the saved artifact graph.

## Supporting Flows

Expand All @@ -140,10 +141,16 @@ night-shift start
If a run was interrupted, resume from the saved journal:

```sh
night-shift doctor
night-shift resume --explain
night-shift resume
night-shift resume --ui
```

`doctor` is the dry recovery pass. It classifies each task as
`safe_to_resume`, `resume_with_warning`, `manual_attention`, or
`irrecoverable` before you mutate any run state.

If open Night Shift pull requests received feedback and you want a fresh
replacement stack instead of in-place edits:

Expand Down
7 changes: 4 additions & 3 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,10 @@ night-shift report
```

Use `resolve` when planning needs human decisions, `resume` when a run was
interrupted, `plan --from-reviews` when open Night Shift PRs need a fresh
successor stack, and `reset` when you need to eject the repo-local control
plane and start over.
interrupted, `doctor` or `resume --explain` when you want a dry recovery read,
`plan --from-reviews` when open Night Shift PRs need a fresh successor stack,
and `reset` when you need to eject the repo-local control plane and start
over.

## Repository

Expand Down
23 changes: 23 additions & 0 deletions docs/run-lifecycle.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ and the next action becomes `night-shift start`.
`resume` is the recovery path for an interrupted run:

```sh
night-shift doctor
night-shift resume --explain
night-shift resume
night-shift resume --run run-123 --ui
```
Expand All @@ -66,6 +68,11 @@ Night Shift reloads the saved run, validates the saved environment, recovers
in-flight tasks, and continues orchestration. It does not re-resolve provider
or environment settings; it reuses what the run journal already saved.

`doctor` and `resume --explain` are the read-only recovery surfaces. They
inspect the saved run, active lock, worktrees, logs, review drift, and
interrupted task states, then classify each task as `safe_to_resume`,
`resume_with_warning`, `manual_attention`, or `irrecoverable`.

## Review-Driven Replanning

Review feedback re-enters Night Shift through `plan --from-reviews`:
Expand Down Expand Up @@ -99,6 +106,21 @@ it recomputes drift against the current PR tree when the run has a stored
review snapshot, while the on-disk `report.md` remains the stable persisted
artifact for the run.

## Provenance

`provenance` is the operator-facing evidence ledger for a run:

```sh
night-shift provenance
night-shift provenance --run run-123 --format json
night-shift provenance --task task-1
```

Night Shift persists `./.night-shift/runs/<run-id>/provenance.json` alongside
`report.md`. The command normalizes the run journal, prompt artifacts, logs,
payload-repair traces, verification artifacts, worktree paths, and confidence
posture into one inspectable view.

## Reset

`reset` is the eject handle when the repo-local control plane has to go:
Expand Down Expand Up @@ -141,6 +163,7 @@ Night Shift binds to `127.0.0.1`, prefers port `8787`, and serves:
- run history for the current repository
- run summary metadata
- repo-state summary for review-driven runs, including open PR counts and drift
- confidence posture and provenance path
- task status
- event timeline
- report content
Expand Down
10 changes: 10 additions & 0 deletions docs/state-and-artifacts.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Each run directory contains durable state for one run:
- `state.json`
- `events.jsonl`
- `report.md`
- `provenance.json`
- `logs/`
- `worktrees/`

Expand All @@ -53,6 +54,11 @@ The run record itself stores:
- task list and task states
- timestamps and current run status

`provenance.json` is the normalized evidence ledger for the run. It reuses the
saved run state plus artifact paths under `logs/` to record planning
provenance, prompt and payload traces, verification evidence, touched files,
worktree paths, PR linkage, and confidence posture.

## Planning Artifacts

Planning writes artifacts under `./.night-shift/planning/<timestamp>/`. Those
Expand Down Expand Up @@ -88,6 +94,10 @@ review-driven runs: it refreshes repo-state drift against the current open PR
tree when a stored snapshot exists, so its live output is authoritative for
current drift while `report.md` remains durable and offline-readable.

Likewise, the persisted `provenance.json` is the stable audit artifact for the
run, while `night-shift provenance` can render the same evidence in markdown or
refresh live review drift in JSON output.

Task-level provider logs and prompt files live under each run's `logs/`
directory.

Expand Down
34 changes: 32 additions & 2 deletions src/night_shift/app.gleam
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,10 @@ import night_shift/repo_state_runtime
import night_shift/report
import night_shift/system
import night_shift/types
import night_shift/usecase/doctor as doctor_usecase
import night_shift/usecase/init as init_usecase
import night_shift/usecase/plan as plan_usecase
import night_shift/usecase/provenance as provenance_usecase
import night_shift/usecase/render as usecase_render
import night_shift/usecase/reset as reset_usecase
import night_shift/usecase/resolve as resolve_usecase
Expand Down Expand Up @@ -131,9 +133,13 @@ fn run_initialized_command(
types.Start(run, True) -> start_with_ui(repo_root, run, config)
types.Status(run) -> io.println(status(repo_root, run, config))
types.Report(run) -> io.println(report(repo_root, run, config))
types.Provenance(run, task_id, format) ->
io.println(provenance(repo_root, run, task_id, format, config))
types.Doctor(run) -> io.println(doctor(repo_root, run, config))
types.Resolve(run) -> io.println(resolve(repo_root, run, config))
types.Resume(run, False) -> io.println(resume(repo_root, run, config))
types.Resume(run, True) -> resume_with_ui(repo_root, run, config)
types.Resume(run, False, False) -> io.println(resume(repo_root, run, config))
types.Resume(run, True, False) -> resume_with_ui(repo_root, run, config)
types.Resume(run, False, True) -> io.println(doctor(repo_root, run, config))
_ -> io.println("Unsupported command.")
}
}
Expand Down Expand Up @@ -276,6 +282,30 @@ fn resume(
}
}

fn doctor(
repo_root: String,
run: types.RunSelector,
config: types.Config,
) -> String {
case doctor_usecase.execute(repo_root, run, config) {
Ok(rendered) -> rendered
Error(message) -> message
}
}

fn provenance(
repo_root: String,
run: types.RunSelector,
task_id: Option(String),
format: types.ProvenanceFormat,
config: types.Config,
) -> String {
case provenance_usecase.execute(repo_root, run, task_id, format, config) {
Ok(rendered) -> rendered
Error(message) -> message
}
}

fn stringify_notifiers(notifiers: List(types.NotifierName)) -> String {
notifiers
|> list.map(types.notifier_to_string)
Expand Down
51 changes: 45 additions & 6 deletions src/night_shift/cli.gleam
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,10 @@ pub fn usage() -> String {
<> " start [--run <id>|latest] [--ui]\n"
<> " status [--run <id>|latest]\n"
<> " report [--run <id>|latest]\n"
<> " provenance [--run <id>|latest] [--task <task-id>] [--format <json|md>]\n"
<> " doctor [--run <id>|latest]\n"
<> " resolve [--run <id>|latest]\n"
<> " resume [--run <id>|latest] [--ui]\n"
<> " resume [--run <id>|latest] [--ui|--explain]\n"
}

/// Parse raw command-line arguments into a `Command`.
Expand Down Expand Up @@ -48,6 +50,8 @@ pub fn parse(args: List(String)) -> Result(types.Command, String) {
["start", ..rest] -> parse_start(rest)
["status", ..rest] -> parse_run_lookup(rest, types.Status)
["report", ..rest] -> parse_run_lookup(rest, types.Report)
["provenance", ..rest] -> parse_provenance(rest)
["doctor", ..rest] -> parse_run_lookup(rest, types.Doctor)
["resolve", ..rest] -> parse_run_lookup(rest, types.Resolve)
["resume", ..rest] -> parse_resume(rest)
["review", ..] ->
Expand Down Expand Up @@ -256,21 +260,56 @@ fn parse_start_flags(
}

fn parse_resume(args: List(String)) -> Result(types.Command, String) {
parse_resume_flags(args, types.LatestRun, False)
parse_resume_flags(args, types.LatestRun, False, False)
}

fn parse_resume_flags(
args: List(String),
run: types.RunSelector,
ui_enabled: Bool,
explain_only: Bool,
) -> Result(types.Command, String) {
case args {
[] -> Ok(types.Resume(run, ui_enabled))
[] ->
case ui_enabled && explain_only {
True -> Error("`resume --explain` cannot be combined with `--ui`.")
False -> Ok(types.Resume(run, ui_enabled, explain_only))
}
["--run", "latest", ..rest] ->
parse_resume_flags(rest, types.LatestRun, ui_enabled, explain_only)
["--run", run_id, ..rest] ->
parse_resume_flags(rest, types.RunId(run_id), ui_enabled, explain_only)
["--ui", ..rest] -> parse_resume_flags(rest, run, True, explain_only)
["--explain", ..rest] ->
parse_resume_flags(rest, run, ui_enabled, True)
[flag, ..] -> Error("Unsupported flag: " <> flag)
}
}

fn parse_provenance(args: List(String)) -> Result(types.Command, String) {
parse_provenance_flags(args, types.LatestRun, None, types.ProvenanceMarkdown)
}

fn parse_provenance_flags(
args: List(String),
run: types.RunSelector,
task_id: Option(String),
format: types.ProvenanceFormat,
) -> Result(types.Command, String) {
case args {
[] -> Ok(types.Provenance(run, task_id, format))
["--run", "latest", ..rest] ->
parse_resume_flags(rest, types.LatestRun, ui_enabled)
parse_provenance_flags(rest, types.LatestRun, task_id, format)
["--run", run_id, ..rest] ->
parse_resume_flags(rest, types.RunId(run_id), ui_enabled)
["--ui", ..rest] -> parse_resume_flags(rest, run, True)
parse_provenance_flags(rest, types.RunId(run_id), task_id, format)
["--task", next_task_id, ..rest] ->
parse_provenance_flags(rest, run, Some(next_task_id), format)
["--format", "json", ..rest] ->
parse_provenance_flags(rest, run, task_id, types.ProvenanceJson)
["--format", "md", ..rest] ->
parse_provenance_flags(rest, run, task_id, types.ProvenanceMarkdown)
["--format", raw_format, ..] ->
Error("Unsupported provenance format: " <> raw_format)
[flag, ..] -> Error("Unsupported flag: " <> flag)
}
}
Expand Down
Loading
Loading