diff --git a/briefs/prep-clarification-escape-hatch.md b/briefs/prep-clarification-escape-hatch.md new file mode 100644 index 00000000..83d30177 --- /dev/null +++ b/briefs/prep-clarification-escape-hatch.md @@ -0,0 +1,87 @@ +# Prep clarification escape hatch — surface blocking ambiguities to the human + +Spec shorthand: `partnered/full` (default depth). Codebase touchpoints already mapped below — no `--with-prep` needed. + +## Outcome + +Give the `prep` phase a first-class way to (1) record open questions / ambiguities structurally as a category distinct from findings, and (2) **optionally halt the run and hand genuinely blocking questions back to a human** instead of silently guessing. The halt behavior is **default-ON** ("prep may ask"); a flag opts out, and the decision doc tells users to opt out on cloud / unattended runs where no human is present to answer. + +A reviewer checks: a prep run that hits a blocking ambiguity pauses the plan into `AWAITING_HUMAN` with the questions surfaced; the same run with the opt-out flag proceeds, recording the questions as explicit assumptions; non-blocking questions never halt under either setting. + +## Background — why + +Today prep already *notices* ambiguity: `_assemble_prep_outputs` writes `gap_notes` and `contradiction_notes` into `prep_metrics.json`, and the distill prompt instructs the model to populate them "when the findings disagree, time out, error, or leave concrete uncertainty." **But those notes live only in the metrics sidecar** — `distill_prep()` filters its payload down to `PREP_COMPATIBLE_KEYS`, which drops them, so nothing downstream (`plan`) ever reads them. Net effect: prep sees the ambiguity, writes it to a file nobody reads, and the planner proceeds to guess. This is the known "prep silently guesses scope instead of failing loud" gap. + +The halt-and-ask primitive already exists but is wired only to the plan phase: `STATE_AWAITING_HUMAN` (an automation-terminal state the auto-driver cleanly pauses on) plus the `ClarificationRecord` type (`refined_idea / intent_summary / questions`) stored at `state["clarification"]`. This work threads prep into that existing machinery — it does not invent a new state, a new return channel, or a new interactive loop. + +## Scope (IN) + +1. **Structured `open_questions` in prep output.** Add an `open_questions[]` field to the prep payload, carried all the way to `plan` (i.e. added to `PREP_COMPATIBLE_KEYS` and the `prep.json` schema). Each entry is classified by severity: + - `blocking` — a genuine ambiguity that changes the plan and the model cannot responsibly resolve alone. + - `assume_and_proceed` — uncertainty the model resolved by making an explicit, recorded assumption (carries an `assumption` string). + Source these from the existing `gap_notes` / `contradiction_notes` signal the distill step already produces; do not build a parallel detection mechanism. + +2. **Distill prompt update.** Instruct the distill step to emit `open_questions`, classify each by severity, and for `assume_and_proceed` state the assumption it made. Keep the existing gap/contradiction notes in the metrics sidecar as-is. + + **Consumption disposition (mirror the critique "flags are hypotheses, not verdicts" stance).** Whatever consumes the prep output — the `plan` prompt, and the operator reading a paused run — must be told explicitly that **not everything prep flags is meaningful or relevant**, and to apply discretion about what to act on. A prep `open_question` (blocking or not) is a *candidate concern*, not a mandate: the planner weighs each on its merits, acts on the material ones, and dismisses the immaterial / irrelevant / already-resolved ones with a reason. Question *count* is not itself a signal. Add this framing to the `plan` prompt where it ingests prep output (and to the surfaced text on a paused run), parallel to how the existing critique guidance frames flags. This applies even to `blocking` questions: a human answering a paused run should understand they may legitimately judge a flagged "blocker" to be a non-issue and resume. + +3. **Default-ON "may ask" halt.** When prep finishes with ≥1 `blocking` question **and** clarify is enabled (the default), transition the plan to `STATE_AWAITING_HUMAN`, writing the blocking questions into `ClarificationRecord.questions` (reuse the existing type and `state["clarification"]` slot). Add the required `prep → AWAITING_HUMAN` transition to the workflow state machine. Non-blocking questions never trigger this. + +4. **Opt-out flag, default enabled.** Add `--no-prep-clarify` (disables the halt). Default = enabled (prep may ask). Persist the setting into `state.config` at `init` so the prep handler reads it. Wire it through `init` and accept it as a per-milestone field for `chain`. Optionally honor a `[defaults]` config key (e.g. `prep_clarify = false`) so a cloud box can opt out once — nice-to-have, not required. + +5. **Auto-driver paused outcome surfaces the questions.** The auto-driver already exits cleanly on `AWAITING_HUMAN`; ensure its paused `DriverOutcome.reason` (or log) includes the blocking questions so the operator sees *what* to answer without digging into state.json. + +6. **Resume path documented (reuse existing verbs).** The human answers by injecting guidance via the existing `override add-note`, then resumes the plan. Document this loop. Do **not** build a new `clarify --answer` command. + +7. **Decision-doc update** (`megaplan/data/decision_skill.md`, the symlinked source behind the `megaplan-decision` skill). Document the default-ON behavior, the `--no-prep-clarify` flag, and explicit guidance: **opt out on cloud / unattended runs** (no human to answer → a blocking question would just strand the run at `AWAITING_HUMAN`). Place it in the Prep optional-phase section and add the modifier to the flag list. + +## Scope (OUT) / anti-scope + +- **Do not** add a new interactive CLI command for answering questions; reuse `override add-note` + resume. +- **Do not** add cloud auto-detection — the cloud opt-out is *documented guidance* + the flag, not runtime detection. +- **Do not** make the auto-driver block/wait on input; it must keep its non-interactive contract (pause-and-exit only). +- **Do not** rework the plan-phase clarification path beyond reusing `ClarificationRecord`. +- **Do not** redesign the gap/contradiction capture in `prep_metrics.json`; source from it, leave it intact. +- **Do not** change prep's 3-step pipeline shape (triage → fan-out → distill) or its model routing. + +## Locked decisions + +- Reuse `STATE_AWAITING_HUMAN` and `ClarificationRecord` — no new state or type. +- Halt is **default-ON**; opt-out flag is `--no-prep-clarify`. +- Severity gating: only `blocking` halts; `assume_and_proceed` rides forward as a recorded assumption visible to `plan`. +- Cloud opt-out is documentation + flag, not auto-detection. +- Resume = `override add-note` + resume; no new command. + +## Open questions for the planner + +- Exact field name/shape for severity classification and the assumption string within `open_questions[]` — pick something consistent with existing prep schema conventions. +- Whether the `[defaults].prep_clarify` config key is worth including now or deferred — planner's call; it's optional. + +## Constraints + +- The auto-driver must remain strictly non-interactive (pause-and-exit, never block on stdin). +- Backward-compat: existing prep outputs / `prep.json` consumers must not break when `open_questions` is absent or empty (treat as no questions). +- A run with `--no-prep-clarify` must behave exactly like today's halt-free flow, except that blocking questions are now recorded (as assumptions/notes) rather than dropped. + +## Done criteria + +- Tests: + - prep yielding a `blocking` question with clarify enabled → plan transitions to `AWAITING_HUMAN`, questions present in `ClarificationRecord`. + - same with `--no-prep-clarify` → plan proceeds past prep; blocking question recorded (not dropped, not halting). + - `assume_and_proceed` questions never cause a halt under either setting. + - `prep.json` schema validates `open_questions[]`; absence/empty is valid. + - auto-driver paused outcome/log includes the blocking question text. +- `decision_skill.md` updated with flag + cloud opt-out guidance. +- Existing prep/auto test suites still pass. + +## Touchpoints (file:line from current tree) + +- `megaplan/orchestration/prep_research.py` — `PREP_COMPATIBLE_KEYS` (~32–41), `distill_prep()` (~904–944), `_assemble_prep_outputs` gap/contradiction notes (~469–474), `prep.json` assembly (~1011, the `_artifact_json(plan_dir, "prep.json", …)` write). +- `megaplan/schemas/runtime.py` — the `prep.json` schema is defined inline here (~98–153, under the `"prep.json"` key); add `open_questions[]` there. (There is no `.megaplan/schemas/prep.json` file.) +- `megaplan/prompts/planning.py` — `_prep_distill_prompt` distill prompt (~475–531), and the `plan` prompt's prep-ingestion section (add the "use discretion; not every flag is meaningful" framing here, parallel to the existing critique-flag guidance). +- `megaplan/types.py` — `ClarificationRecord` (~130–133), state constants & `AUTOMATION_TERMINAL_STATES` (~13–37). +- `megaplan/_core/workflow_data.py` — transitions (~76–78); add `prep → AWAITING_HUMAN`. +- `megaplan/auto.py` — paused-outcome handling for `AWAITING_HUMAN` (~1143–1178). +- `megaplan/handlers/override.py` — `_override_add_note()` (~133–168) for the documented resume loop (reference only). +- CLI init + chain milestone parsing — `megaplan/cli.py` (existing prep flags ~L3114/L3710) and `megaplan/chain.py` (`_init_plan` ~L1441); wire `--no-prep-clarify` and per-milestone field; persist into `state.config`. +- `megaplan/data/decision_skill.md` — doc update. diff --git a/megaplan/_core/user_config.py b/megaplan/_core/user_config.py index d72fbb9c..6c91213a 100644 --- a/megaplan/_core/user_config.py +++ b/megaplan/_core/user_config.py @@ -72,4 +72,20 @@ def default_vendor(home: Path | None = None) -> str: return "claude" -__all__ = ["VALID_VENDORS", "default_vendor", "load_user_config_toml"] +def default_prep_clarify(home: Path | None = None) -> bool: + """Return the configured default for prep_clarify, falling back to ``True``. + + Reads ``[defaults].prep_clarify`` from the user's ``config.toml``. + Only boolean values are accepted; anything else falls back to ``True``. + """ + data = load_user_config_toml(home) + defaults = data.get("defaults") + if not isinstance(defaults, dict): + return True + prep_clarify = defaults.get("prep_clarify") + if isinstance(prep_clarify, bool): + return prep_clarify + return True + + +__all__ = ["VALID_VENDORS", "default_prep_clarify", "default_vendor", "load_user_config_toml"] diff --git a/megaplan/_core/workflow_data.py b/megaplan/_core/workflow_data.py index 0ec1212e..b5b1af1c 100644 --- a/megaplan/_core/workflow_data.py +++ b/megaplan/_core/workflow_data.py @@ -43,6 +43,7 @@ class Transition: WORKFLOW: dict[str, list[Transition]] = { STATE_INITIALIZED: [ Transition("prep", STATE_PREPPED), + Transition("prep", STATE_AWAITING_HUMAN), ], STATE_PREPPED: [ Transition("plan", STATE_PLANNED), @@ -75,6 +76,7 @@ class Transition: ], STATE_AWAITING_HUMAN: [ Transition("verify-human", STATE_DONE), + Transition("resume-clarify", STATE_PREPPED), ], STATE_TIEBREAKER_PENDING: [ Transition("tiebreaker-run", STATE_TIEBREAKER_READY), diff --git a/megaplan/auto.py b/megaplan/auto.py index 310ec1bb..6d9380d0 100644 --- a/megaplan/auto.py +++ b/megaplan/auto.py @@ -897,6 +897,19 @@ def _clear_orphaned_active_step(plan_dir: Path | None, expected_step: str) -> bo return True +def _plan_clarification(plan_dir: Path | None) -> dict[str, Any] | None: + """Read the ``clarification`` field from the plan state file, if any.""" + if plan_dir is None: + return None + try: + state_data = json.loads((plan_dir / "state.json").read_text(encoding="utf-8")) + except (OSError, json.JSONDecodeError, ValueError): + return None + if not isinstance(state_data, dict): + return None + return state_data.get("clarification") + + def drive( plan: str, *, @@ -1141,12 +1154,31 @@ def _outcome( # Terminal: plan reached a final state (or automation-terminal). if state in AUTOMATION_TERMINAL_STATES and not (state == STATE_BLOCKED and valid_next): if state == STATE_AWAITING_HUMAN: - log("plan awaiting human verification — automation stopping") + # Distinguish prep-sourced halts (blocking ambiguities) from + # criteria-verification halts. Prep halts include the blocking + # questions and the resume loop so the operator can act. + clarification = _plan_clarification(plan_dir) + if isinstance(clarification, dict) and clarification.get("source") == "prep": + questions = clarification.get("questions") or [] + question_list = "\n".join(f" - {q}" for q in questions) + hint = ( + "answer via `megaplan override add-note \"…\"` and resume via " + "`megaplan override resume-clarify`" + ) + reason = ( + f"prep surfaced {len(questions)} blocking " + f"ambiguities; a human may judge a flagged blocker a non-issue.\n" + f"blocking questions:\n{question_list}\n{hint}" + ) + log(f"plan awaiting human clarification ({len(questions)} blocking questions) — automation stopping") + else: + log("plan awaiting human verification — automation stopping") + reason = "plan has criteria requiring human verification" return _outcome( "awaiting_human", final_state=state, iterations=iteration, - reason="plan has criteria requiring human verification", + reason=reason, last_phase=last_phase, ) if state == STATE_TIEBREAKER_PENDING: diff --git a/megaplan/chain.py b/megaplan/chain.py index f9f88fe8..e20ca0aa 100644 --- a/megaplan/chain.py +++ b/megaplan/chain.py @@ -161,6 +161,7 @@ class MilestoneSpec: deepseek_provider: str | None = None with_prep: bool = False with_feedback: bool = False + prep_clarify: bool = True prep_direction: str | None = None phase_model: list[str] = field(default_factory=list) bakeoff: dict[str, Any] | None = None @@ -211,6 +212,13 @@ def from_dict(cls, raw: dict[str, Any], index: int) -> "MilestoneSpec": ) with_prep = _optional_bool(raw, "with_prep", index=index) with_feedback = _optional_bool(raw, "with_feedback", index=index) + prep_clarify_raw = raw.get("prep_clarify") + if prep_clarify_raw is None: + prep_clarify = True + elif isinstance(prep_clarify_raw, bool): + prep_clarify = prep_clarify_raw + else: + raise CliError("invalid_spec", f"milestones[{index}].prep_clarify must be a boolean") prep_direction_raw = raw.get("prep_direction") if prep_direction_raw is None: prep_direction = None @@ -252,6 +260,7 @@ def from_dict(cls, raw: dict[str, Any], index: int) -> "MilestoneSpec": deepseek_provider=deepseek_provider, with_prep=with_prep, with_feedback=with_feedback, + prep_clarify=prep_clarify, prep_direction=prep_direction, phase_model=phase_model, bakeoff=bakeoff, @@ -1451,6 +1460,7 @@ def _init_plan( deepseek_provider: str | None = None, with_prep: bool = False, with_feedback: bool = False, + prep_clarify: bool = True, prep_direction: str | None = None, phase_model: list[str] | None = None, writer, @@ -1480,6 +1490,8 @@ def _init_plan( args.append("--with-prep") if with_feedback: args.append("--with-feedback") + if not prep_clarify: + args.append("--no-prep-clarify") if prep_direction: args.extend(["--prep-direction", prep_direction]) for override in phase_model or []: @@ -1910,6 +1922,7 @@ def log(msg: str, **fields: Any) -> None: deepseek_provider=milestone.deepseek_provider, with_prep=milestone.with_prep, with_feedback=milestone.with_feedback, + prep_clarify=milestone.prep_clarify, prep_direction=milestone.prep_direction, phase_model=milestone.phase_model, writer=writer, diff --git a/megaplan/cli.py b/megaplan/cli.py index 2aecd80f..5aa106cd 100644 --- a/megaplan/cli.py +++ b/megaplan/cli.py @@ -3122,6 +3122,15 @@ def _add_vendor_critic_args(parser: argparse.ArgumentParser) -> None: "APIs, research-heavy briefs, or ambiguous requirements. " "Redundant on --robustness thorough|extreme (no-op).", ) + parser.add_argument( + "--no-prep-clarify", + action="store_false", + dest="prep_clarify", + default=True, + help="Disable the prep clarification gate. When set, prep blocking " + "questions are recorded in prep.json but do not halt the run; " + "planning proceeds immediately. Default: enabled.", + ) parser.add_argument( "--with-feedback", action="store_true", @@ -3807,6 +3816,7 @@ def build_parser() -> argparse.ArgumentParser: "add-note", "replan", "recover-blocked", + "resume-clarify", "set-robustness", "set-profile", "set-model", diff --git a/megaplan/data/_codex_skills/megaplan-decision/SKILL.md b/megaplan/data/_codex_skills/megaplan-decision/SKILL.md index 8f08014b..31aab57e 100644 --- a/megaplan/data/_codex_skills/megaplan-decision/SKILL.md +++ b/megaplan/data/_codex_skills/megaplan-decision/SKILL.md @@ -186,6 +186,7 @@ Default to `low`; only spend on depth when you can name the specific reason the - `megaplan override set-robustness --robustness LEVEL --plan ID` — same for the planning-complexity dial. - `megaplan override replan --plan ID` — back up to planning and redo with whatever models / robustness are now active. - `megaplan override add-note --plan ID --note "..."` — inject guidance into an active plan without restarting any phase. Read by every subsequent phase. The brief is snapshotted at `init`; later edits to the idea-file are NOT re-read, so this is the verb for "I missed something." **`megaplan feedback` is end-of-run rating, not in-flight guidance** — common confusion. +- `megaplan override resume-clarify --plan ID` — resume a run halted at `awaiting_human_verify` because prep surfaced blocking ambiguities. Answer the questions first via `override add-note`, then run this to advance to the plan phase. Only valid for prep-sourced halts; criteria-verification halts use `verify-human`. Lean on these instead of inventing new profile names. If you find yourself thinking "I want a profile that's *like* `partnered` but with X" — the answer is almost always `partnered` plus an override, not a new profile. @@ -224,6 +225,25 @@ distill = "hermes:deepseek:deepseek-v4-pro" # connects across areas, read-only Rationale: triage is the highest-leverage step (a bad route starves everything downstream), so it uses DeepSeek Pro by default; fan-out is the cost lever (flash × up to 10 ≪ pro × 10); distill must reconcile across areas. Explicit `claude:` and `shannon:` prep model entries are rejected until real read-only runners exist. Under `--vendor codex`, a resolved Codex flat prep route switches triage/distill to the Codex read-only runner; fan-out stays on the cheap Hermes/DeepSeek workers. Design + status: `briefs/prep-fanout-research-dossier.md`. +**Prep clarification ("prep may ask").** When prep runs and discovers genuine ambiguities it cannot responsibly resolve alone, it surfaces them as **blocking questions** that pause the run at `awaiting_human_verify` before the plan phase begins. This is **on by default** — prep is allowed to ask. Blocking questions are candidate concerns, not verdicts: a human may judge a flagged blocker a non-issue and resume immediately. + +Each blocking question is presented with the question text and the reason it was classified as blocking (not an `assume_and_proceed` with a stated assumption). The operator answers via the existing `override add-note` mechanism, then resumes with `override resume-clarify` — the plan returns to `PREPPED` and the planner phase runs with both the prep output and the human's answers in context. + +**Opting out of prep clarification.** On cloud CI or unattended runs where no human is available, a blocking question strands the run at `awaiting_human_verify` indefinitely. Disable with `--no-prep-clarify` at init, or set `prep_clarify = false` in `[defaults]` in `~/.config/megaplan/config.toml`: + +```toml +[defaults] +prep_clarify = false # never halt for prep questions (CI / unattended) +``` + +The CLI flag wins over the config default. When prep clarification is disabled, prep still writes `open_questions` into `prep.json` (both blocking and `assume_and_proceed` items) — the planner sees them as hints — but no question halts the run. + +Concrete resume loop: +1. Run reaches `awaiting_human_verify` with prep blocking questions. +2. Operator reads the questions, judges which are material. +3. `megaplan override add-note --plan --note ""` +4. `megaplan override resume-clarify --plan ` → plan returns to `PREPPED` and continues. + ### Feedback (`--with-feedback`) > **"Do you want a per-stage ratings template waiting on disk when the run finishes?"** @@ -276,6 +296,7 @@ The invocation has three layers: three flags for the dials, four modifiers for o - **`--deepseek-provider fireworks|direct`** — swaps canonical DeepSeek v4-pro slots between Fireworks and DeepSeek's direct API. Defaults to `direct`; use `fireworks` as the explicit secondary/fallback route. - **`--with-prep`** — force the `prep` research phase into the workflow regardless of `--robustness`. Off by default; no-op at `thorough`/`extreme`. See "Optional phases" above. - **`--prep-direction "…"`** — steering text shown to the prep worker (when prep runs) as a "User direction for prep" section. Points prep at specific files / subsystems / questions to explore. Can also be set or replaced later with `megaplan prep --direction "…"` before the phase runs. No-op if prep is skipped. See "Optional phases" above. +- **`--no-prep-clarify`** — disable prep clarification halts. When prep surfaces blocking ambiguities, the run pauses at `awaiting_human_verify` by default so a human can answer; on cloud CI or unattended runs where no human is available, pass this flag to let the planner proceed with the prep output as hints instead. Also settable as `prep_clarify = false` in `[defaults]` in config. See "Prep clarification" above. - **`--with-feedback`** — force the `feedback` phase into the workflow regardless of `--robustness`. Scaffolds `feedback.md` (a per-stage ratings template) between `review` and `done`, then completes the plan non-interactively. Off by default. See "Optional phases" above. ### The escape hatch diff --git a/megaplan/data/decision_skill.md b/megaplan/data/decision_skill.md index 8f08014b..31aab57e 100644 --- a/megaplan/data/decision_skill.md +++ b/megaplan/data/decision_skill.md @@ -186,6 +186,7 @@ Default to `low`; only spend on depth when you can name the specific reason the - `megaplan override set-robustness --robustness LEVEL --plan ID` — same for the planning-complexity dial. - `megaplan override replan --plan ID` — back up to planning and redo with whatever models / robustness are now active. - `megaplan override add-note --plan ID --note "..."` — inject guidance into an active plan without restarting any phase. Read by every subsequent phase. The brief is snapshotted at `init`; later edits to the idea-file are NOT re-read, so this is the verb for "I missed something." **`megaplan feedback` is end-of-run rating, not in-flight guidance** — common confusion. +- `megaplan override resume-clarify --plan ID` — resume a run halted at `awaiting_human_verify` because prep surfaced blocking ambiguities. Answer the questions first via `override add-note`, then run this to advance to the plan phase. Only valid for prep-sourced halts; criteria-verification halts use `verify-human`. Lean on these instead of inventing new profile names. If you find yourself thinking "I want a profile that's *like* `partnered` but with X" — the answer is almost always `partnered` plus an override, not a new profile. @@ -224,6 +225,25 @@ distill = "hermes:deepseek:deepseek-v4-pro" # connects across areas, read-only Rationale: triage is the highest-leverage step (a bad route starves everything downstream), so it uses DeepSeek Pro by default; fan-out is the cost lever (flash × up to 10 ≪ pro × 10); distill must reconcile across areas. Explicit `claude:` and `shannon:` prep model entries are rejected until real read-only runners exist. Under `--vendor codex`, a resolved Codex flat prep route switches triage/distill to the Codex read-only runner; fan-out stays on the cheap Hermes/DeepSeek workers. Design + status: `briefs/prep-fanout-research-dossier.md`. +**Prep clarification ("prep may ask").** When prep runs and discovers genuine ambiguities it cannot responsibly resolve alone, it surfaces them as **blocking questions** that pause the run at `awaiting_human_verify` before the plan phase begins. This is **on by default** — prep is allowed to ask. Blocking questions are candidate concerns, not verdicts: a human may judge a flagged blocker a non-issue and resume immediately. + +Each blocking question is presented with the question text and the reason it was classified as blocking (not an `assume_and_proceed` with a stated assumption). The operator answers via the existing `override add-note` mechanism, then resumes with `override resume-clarify` — the plan returns to `PREPPED` and the planner phase runs with both the prep output and the human's answers in context. + +**Opting out of prep clarification.** On cloud CI or unattended runs where no human is available, a blocking question strands the run at `awaiting_human_verify` indefinitely. Disable with `--no-prep-clarify` at init, or set `prep_clarify = false` in `[defaults]` in `~/.config/megaplan/config.toml`: + +```toml +[defaults] +prep_clarify = false # never halt for prep questions (CI / unattended) +``` + +The CLI flag wins over the config default. When prep clarification is disabled, prep still writes `open_questions` into `prep.json` (both blocking and `assume_and_proceed` items) — the planner sees them as hints — but no question halts the run. + +Concrete resume loop: +1. Run reaches `awaiting_human_verify` with prep blocking questions. +2. Operator reads the questions, judges which are material. +3. `megaplan override add-note --plan --note ""` +4. `megaplan override resume-clarify --plan ` → plan returns to `PREPPED` and continues. + ### Feedback (`--with-feedback`) > **"Do you want a per-stage ratings template waiting on disk when the run finishes?"** @@ -276,6 +296,7 @@ The invocation has three layers: three flags for the dials, four modifiers for o - **`--deepseek-provider fireworks|direct`** — swaps canonical DeepSeek v4-pro slots between Fireworks and DeepSeek's direct API. Defaults to `direct`; use `fireworks` as the explicit secondary/fallback route. - **`--with-prep`** — force the `prep` research phase into the workflow regardless of `--robustness`. Off by default; no-op at `thorough`/`extreme`. See "Optional phases" above. - **`--prep-direction "…"`** — steering text shown to the prep worker (when prep runs) as a "User direction for prep" section. Points prep at specific files / subsystems / questions to explore. Can also be set or replaced later with `megaplan prep --direction "…"` before the phase runs. No-op if prep is skipped. See "Optional phases" above. +- **`--no-prep-clarify`** — disable prep clarification halts. When prep surfaces blocking ambiguities, the run pauses at `awaiting_human_verify` by default so a human can answer; on cloud CI or unattended runs where no human is available, pass this flag to let the planner proceed with the prep output as hints instead. Also settable as `prep_clarify = false` in `[defaults]` in config. See "Prep clarification" above. - **`--with-feedback`** — force the `feedback` phase into the workflow regardless of `--robustness`. Scaffolds `feedback.md` (a per-stage ratings template) between `review` and `done`, then completes the plan non-interactively. Off by default. See "Optional phases" above. ### The escape hatch diff --git a/megaplan/handlers/init.py b/megaplan/handlers/init.py index ff29cbbf..e8feb08b 100644 --- a/megaplan/handlers/init.py +++ b/megaplan/handlers/init.py @@ -396,6 +396,14 @@ def handle_init(root: Path, args: argparse.Namespace) -> StepResponse: state["config"]["with_prep"] = True if getattr(args, "with_feedback", False): state["config"]["with_feedback"] = True + # Resolve prep_clarify: CLI > [defaults] > True. + # Only written to config when False to keep state lean (absent == True). + if not getattr(args, "prep_clarify", True): + state["config"]["prep_clarify"] = False + else: + from megaplan._core.user_config import default_prep_clarify + if not default_prep_clarify(): + state["config"]["prep_clarify"] = False prep_direction_raw = getattr(args, "prep_direction", None) if prep_direction_raw is not None: prep_direction = str(prep_direction_raw).strip() diff --git a/megaplan/handlers/override.py b/megaplan/handlers/override.py index 5b024e5a..d705b864 100644 --- a/megaplan/handlers/override.py +++ b/megaplan/handlers/override.py @@ -8,6 +8,7 @@ CliError, PlanState, STATE_ABORTED, + STATE_AWAITING_HUMAN, STATE_BLOCKED, STATE_CRITIQUED, STATE_DONE, @@ -16,6 +17,7 @@ STATE_FINALIZED, STATE_GATED, STATE_PLANNED, + STATE_PREPPED, STATE_REVIEWED, StepResponse, DEFAULT_AGENT_ROUTING, @@ -781,6 +783,56 @@ def _infer_phase_agent(phase: str, state: PlanState, root: Path) -> str | None: return DEFAULT_AGENT_ROUTING.get(phase) +def _override_resume_clarify( + root: Path, plan_dir: Path, state: PlanState, args: argparse.Namespace +) -> StepResponse: + if state["current_state"] != STATE_AWAITING_HUMAN: + raise CliError( + "invalid_transition", + f"resume-clarify requires state '{STATE_AWAITING_HUMAN}', got '{state['current_state']}'", + valid_next=infer_next_steps(state), + ) + if state.get("clarification", {}).get("source") != "prep": + raise CliError( + "invalid_transition", + "resume-clarify can only resume a prep-sourced clarification halt; " + "use verify-human for criteria-verification awaiting_human states", + valid_next=infer_next_steps(state), + ) + notes = state.get("meta", {}).get("notes") or [] + user_notes = [n for n in notes if isinstance(n, dict) and n.get("source", "user") == "user"] + warnings: list[str] = [] + if not user_notes: + warnings.append( + "No answers found in notes; consider adding answers via " + "'override add-note' before the plan phase." + ) + state["current_state"] = STATE_PREPPED + _append_to_meta( + state, + "overrides", + {"action": "resume-clarify", "timestamp": now_utc()}, + ) + save_state_merge_meta(plan_dir, state) + try: + from megaplan.observability.events import emit, EventKind + emit(EventKind.OVERRIDE_APPLIED, plan_dir=plan_dir, payload={"action": "resume-clarify"}) + except Exception: + pass + next_steps = infer_next_steps(state) + response: StepResponse = { + "success": True, + "step": "override", + "summary": "Prep clarification resolved; plan phase is now ready to run.", + "next_step": next_steps[0] if next_steps else None, + "state": STATE_PREPPED, + } + if warnings: + response["warnings"] = warnings + _attach_next_step_runtime(response) + return response + + _OVERRIDE_ACTIONS: dict[ str, Callable[[Path, Path, PlanState, argparse.Namespace], StepResponse] ] = { @@ -789,6 +841,7 @@ def _infer_phase_agent(phase: str, state: PlanState, root: Path) -> str | None: "force-proceed": _override_force_proceed, "replan": _override_replan, "recover-blocked": _override_recover_blocked, + "resume-clarify": _override_resume_clarify, "set-robustness": _override_set_robustness, "set-profile": _override_set_profile, "set-model": _override_set_model, diff --git a/megaplan/handlers/plan.py b/megaplan/handlers/plan.py index 067fc17c..d658f97e 100644 --- a/megaplan/handlers/plan.py +++ b/megaplan/handlers/plan.py @@ -6,7 +6,7 @@ from typing import Any from megaplan import handlers as _pkg -from megaplan.types import CliError, MOCK_ENV_VAR, STATE_INITIALIZED, STATE_PLANNED, STATE_PREPPED, StepResponse +from megaplan.types import CliError, MOCK_ENV_VAR, STATE_AWAITING_HUMAN, STATE_INITIALIZED, STATE_PLANNED, STATE_PREPPED, StepResponse from megaplan._core import load_plan_locked, require_state from .shared import ( @@ -17,6 +17,31 @@ phase_result_guard, ) +def _apply_prep_clarify_gate(state: dict, payload: dict) -> str: + """Decide whether prep should halt for human clarification. + + Returns STATE_AWAITING_HUMAN when prep_clarify is enabled and at least one + blocking open_question exists; otherwise returns STATE_PREPPED. Mutates + state['clarification'] as a side effect on the halt path. + """ + if not state["config"].get("prep_clarify", True): + return STATE_PREPPED + open_questions = payload.get("open_questions") or [] + blocking = [q for q in open_questions if isinstance(q, dict) and q.get("severity") == "blocking"] + if not blocking: + return STATE_PREPPED + n = len(blocking) + state["clarification"] = { + "intent_summary": ( + f"prep surfaced {n} blocking {'ambiguity' if n == 1 else 'ambiguities'}; " + "answer and run `megaplan override resume-clarify`" + ), + "questions": [f"[blocking] {q['question']}" for q in blocking], + "source": "prep", + } + return STATE_AWAITING_HUMAN + + def handle_plan(root: Path, args: argparse.Namespace) -> StepResponse: with load_plan_locked(root, args.plan, step="plan") as (plan_dir, state): require_state(state, "plan", {STATE_INITIALIZED, STATE_PREPPED, STATE_PLANNED}) @@ -98,12 +123,21 @@ def handle_prep(root: Path, args: argparse.Namespace) -> StepResponse: state["config"]["primary_criterion"] = primary_criterion.strip() code_refs = len(worker.payload.get("relevant_code", [])) test_refs = len(worker.payload.get("test_expectations", [])) - state["current_state"] = STATE_PREPPED + next_state = _apply_prep_clarify_gate(state, worker.payload) + state["current_state"] = next_state + if next_state == STATE_AWAITING_HUMAN: + blocking_count = len(state["clarification"]["questions"]) + summary = ( + f"Prep halted: {blocking_count} blocking " + f"{'question' if blocking_count == 1 else 'questions'} require clarification before planning can proceed." + ) + else: + summary = f"Prep complete: captured {code_refs} relevant code reference(s) and {test_refs} test expectation(s)." return _finish_step( plan_dir, state, args, step="prep", worker=worker, agent=agent, mode=mode, refreshed=refreshed, - summary=f"Prep complete: captured {code_refs} relevant code reference(s) and {test_refs} test expectation(s).", + summary=summary, artifacts=[prep_filename], output_file=prep_filename, artifact_hash=artifact_hash, @@ -120,7 +154,16 @@ def handle_prep(root: Path, args: argparse.Namespace) -> StepResponse: primary_criterion = worker.payload.get("primary_criterion") if isinstance(primary_criterion, str) and primary_criterion.strip(): state["config"]["primary_criterion"] = primary_criterion.strip() - state["current_state"] = STATE_PREPPED + next_state = _apply_prep_clarify_gate(state, worker.payload) + state["current_state"] = next_state + if next_state == STATE_AWAITING_HUMAN: + blocking_count = len(state["clarification"]["questions"]) + summary = ( + f"Prep halted: {blocking_count} blocking " + f"{'question' if blocking_count == 1 else 'questions'} require clarification before planning can proceed." + ) + else: + summary = orchestration.summary return _finish_step( plan_dir, state, args, step="prep", @@ -128,7 +171,7 @@ def handle_prep(root: Path, args: argparse.Namespace) -> StepResponse: agent=orchestration.agent, mode=orchestration.mode, refreshed=orchestration.refreshed, - summary=orchestration.summary, + summary=summary, artifacts=orchestration.artifacts, output_file=prep_filename, artifact_hash=artifact_hash, diff --git a/megaplan/orchestration/prep_research.py b/megaplan/orchestration/prep_research.py index 3fc15d4b..e17b1c82 100644 --- a/megaplan/orchestration/prep_research.py +++ b/megaplan/orchestration/prep_research.py @@ -38,6 +38,7 @@ "constraints", "suggested_approach", "primary_criterion", + "open_questions", } PREP_RESEARCH_TOOLSETS = ["file-readonly", "web"] DEFAULT_PREP_RESEARCH_MAX_ITERATIONS = 12 diff --git a/megaplan/prompts/_shared.py b/megaplan/prompts/_shared.py index 14c4d289..b1a5b70a 100644 --- a/megaplan/prompts/_shared.py +++ b/megaplan/prompts/_shared.py @@ -283,6 +283,25 @@ def _cell(value: object) -> str: or "No suggested approach provided." ) + open_questions = prep.get("open_questions", []) + open_questions_section = "" + if isinstance(open_questions, list) and open_questions: + oq_lines = [] + for item in open_questions: + if not isinstance(item, dict): + continue + severity = str(item.get("severity", "")).strip() + question = str(item.get("question", "")).strip() + assumption = str(item.get("assumption", "")).strip() + if not question: + continue + if severity == "assume_and_proceed" and assumption: + oq_lines.append(f"- [{severity}] {question} _(assumption: {assumption})_") + else: + oq_lines.append(f"- [{severity}] {question}") + if oq_lines: + open_questions_section = "\n### Open Questions\n" + "\n".join(oq_lines) + prep_block = textwrap.dedent( f""" Engineering brief produced from the codebase and task details: @@ -304,6 +323,7 @@ def _cell(value: object) -> str: ### Suggested Approach {suggested_approach} + {open_questions_section} """ ).strip() prep_instruction = ( diff --git a/megaplan/prompts/planning.py b/megaplan/prompts/planning.py index b82c48c5..546b543e 100644 --- a/megaplan/prompts/planning.py +++ b/megaplan/prompts/planning.py @@ -189,6 +189,12 @@ def _prep_context_sections(state: PlanState, plan_dir: Path) -> tuple[Path, Path def _plan_prompt(state: PlanState, plan_dir: Path) -> str: project_dir = Path(state["config"]["project_dir"]) prep_block, prep_instruction = _render_prep_block(plan_dir) + if prep_instruction: + prep_instruction += ( + " Open questions in the brief are candidate concerns, not mandates —" + " act on ones that materially affect the plan, dismiss immaterial or" + " already-resolved ones with a brief reason, and treat question count as noise." + ) from_doc = state["config"].get("from_doc") imported_decisions = state["meta"].get("imported_decisions", []) clarification = state.get("clarification", {}) @@ -514,10 +520,15 @@ def _prep_distill_prompt( Produce: - A `prep.json` payload that keeps the public compatibility contract unchanged: - `skip`, `task_summary`, `key_evidence`, `relevant_code`, `test_expectations`, `constraints`, `suggested_approach`. + required: `skip`, `task_summary`, `key_evidence`, `relevant_code`, `test_expectations`, `constraints`, `suggested_approach`; + optional: `open_questions`. - Distilled evidence only. Treat the area findings as evidence to adjudicate, not as text to copy blindly. - Resolve overlaps across areas into one coherent prep view instead of duplicating them. - Clear contradiction or gap notes when the findings disagree, time out, error, or leave concrete uncertainty. + - For each gap or contradiction, classify it as an `open_questions[]` item: + - `"blocking"` — genuine ambiguity that would materially change the plan and cannot be responsibly resolved alone. + - `"assume_and_proceed"` — resolvable by a stated `assumption`; include the assumption text. + Omit `open_questions` entirely when no genuine gaps or contradictions exist. Rules: - Do not add new required fields to the compatible `prep.json` payload. diff --git a/megaplan/schemas/runtime.py b/megaplan/schemas/runtime.py index 7d2d450d..27c8edb3 100644 --- a/megaplan/schemas/runtime.py +++ b/megaplan/schemas/runtime.py @@ -140,6 +140,21 @@ "constraints": {"type": "array", "items": {"type": "string"}}, "primary_criterion": {"type": "string"}, "suggested_approach": {"type": "string"}, + "open_questions": { + "type": "array", + "items": { + "type": "object", + "properties": { + "severity": { + "type": "string", + "enum": ["blocking", "assume_and_proceed"], + }, + "question": {"type": "string"}, + "assumption": {"type": "string"}, + }, + "required": ["severity", "question"], + }, + }, }, "required": [ "skip", diff --git a/megaplan/types.py b/megaplan/types.py index ed8519ae..457973ec 100644 --- a/megaplan/types.py +++ b/megaplan/types.py @@ -56,6 +56,7 @@ class PlanConfig(TypedDict, total=False): tiebreaker_token_budget: int tiebreaker_time_budget_minutes: int strict_notes: NotRequired[bool] + prep_clarify: NotRequired[bool] class PlanMeta(TypedDict, total=False): @@ -131,6 +132,11 @@ class ClarificationRecord(TypedDict, total=False): refined_idea: str intent_summary: str questions: list[str] + # 'prep' when halt is from prep ambiguity; absent for criteria-verification halts. + # Convention: gate serializes blocking items as human-readable strings + # (e.g. "[blocking] "); structured data (severity/assumption) lives + # only in prep.json, not here. + source: str class LastGateRecord(TypedDict, total=False): diff --git a/tests/test_auto.py b/tests/test_auto.py index cd9a218f..2027022f 100644 --- a/tests/test_auto.py +++ b/tests/test_auto.py @@ -2407,3 +2407,103 @@ def fake_run(args, cwd=None, timeout=None, idle_timeout=None, progress_env=None, plan, ] ] + + +# --------------------------------------------------------------------------- +# T12(e): auto-driver prep-sourced AWAITING_HUMAN reason includes blocking questions +# --------------------------------------------------------------------------- + + +def _awaiting_human_status(plan: str, state: str = "awaiting_human_verify") -> dict: + return { + "success": True, + "step": "status", + "plan": plan, + "state": state, + "iteration": 1, + "summary": "Plan is awaiting human input.", + "next_step": None, + "valid_next": [], + } + + +def test_auto_driver_prep_awaiting_human_includes_blocking_questions( + tmp_path: Path, +) -> None: + """DriverOutcome.reason must include blocking question text for prep-sourced halts.""" + plan = "prep-halt-plan" + plan_dir = _make_plan_dir(tmp_path, plan) + (plan_dir / "state.json").write_text( + json.dumps( + { + "name": plan, + "current_state": "awaiting_human_verify", + "clarification": { + "intent_summary": "prep surfaced 2 blocking ambiguities", + "questions": [ + "[blocking] Which auth library?", + "[blocking] REST or GraphQL?", + ], + "source": "prep", + }, + } + ), + encoding="utf-8", + ) + + def fake_status(plan_name: str, cwd=None, timeout=60): + return _awaiting_human_status(plan_name) + + with patch.object(auto, "_status", side_effect=fake_status): + outcome = drive( + plan, + cwd=tmp_path, + max_iterations=1, + poll_sleep=0, + writer=lambda _m: None, + ) + + assert outcome.status == "awaiting_human" + assert "Which auth library?" in outcome.reason + assert "REST or GraphQL?" in outcome.reason + assert "override add-note" in outcome.reason + assert "override resume-clarify" in outcome.reason + + +def test_auto_driver_criteria_awaiting_human_has_generic_reason( + tmp_path: Path, +) -> None: + """Criteria-verification AWAITING_HUMAN must preserve the original generic reason.""" + plan = "criteria-halt-plan" + plan_dir = _make_plan_dir(tmp_path, plan) + (plan_dir / "state.json").write_text( + json.dumps( + { + "name": plan, + "current_state": "awaiting_human_verify", + "clarification": { + "intent_summary": "Criteria verification needed.", + "questions": ["Is the plan acceptable?"], + }, + } + ), + encoding="utf-8", + ) + + def fake_status(plan_name: str, cwd=None, timeout=60): + return _awaiting_human_status(plan_name) + + with patch.object(auto, "_status", side_effect=fake_status): + outcome = drive( + plan, + cwd=tmp_path, + max_iterations=1, + poll_sleep=0, + writer=lambda _m: None, + ) + + assert outcome.status == "awaiting_human" + # Generic reason, not prep-specific + assert "plan has criteria requiring human verification" in outcome.reason + assert "override add-note" not in outcome.reason + assert "override resume-clarify" not in outcome.reason diff --git a/tests/test_chain.py b/tests/test_chain.py index 04ae051c..5cdb7b3a 100644 --- a/tests/test_chain.py +++ b/tests/test_chain.py @@ -939,6 +939,7 @@ def fake_init( deepseek_provider=None, with_prep=False, with_feedback=False, + prep_clarify=True, prep_direction=None, phase_model=None, writer, @@ -1013,6 +1014,7 @@ def fake_init(root, idea_path, **kwargs): "deepseek_provider": "fireworks", "with_prep": True, "with_feedback": True, + "prep_clarify": True, "prep_direction": None, "phase_model": ["execute=claude:low"], "writer": ANY, @@ -1219,6 +1221,7 @@ def fake_init( deepseek_provider=None, with_prep=False, with_feedback=False, + prep_clarify=True, prep_direction=None, phase_model=None, writer, diff --git a/tests/test_prep.py b/tests/test_prep.py index dacabc23..8ee6ce4e 100644 --- a/tests/test_prep.py +++ b/tests/test_prep.py @@ -6,6 +6,8 @@ from typing import Any from unittest.mock import patch +import pytest + import megaplan import megaplan._core import megaplan._core.io as io_module @@ -436,3 +438,379 @@ def test_mocked_prep_orchestration_feeds_plan_prompt_without_changing_plan_artif assert plan_response["artifacts"] == ["plan_v1.md", "plan_v1.meta.json"] assert (plan_dir / "plan_v1.md").exists() assert (plan_dir / "plan_v1.meta.json").exists() + + +# --------------------------------------------------------------------------- +# T12: Prep clarification gate tests (MOCK path, MEGAPLAN_MOCK=1) +# --------------------------------------------------------------------------- + +def _mock_env(monkeypatch: Any) -> None: + """Set up MEGAPLAN_MOCK_WORKERS=1 and mock shutil.which for MOCK prep path.""" + monkeypatch.setenv(megaplan.MOCK_ENV_VAR, "1") + monkeypatch.setattr( + megaplan._core.shutil, + "which", + lambda name: "/usr/bin/mock" if name in {"claude", "codex"} else None, + ) + + +def _make_prep_worker(open_questions: list[dict[str, Any]] | None = None) -> WorkerResult: + """Build a deterministic prep worker payload with optional open_questions.""" + payload: dict[str, Any] = { + "skip": False, + "task_summary": "Research complete.", + "key_evidence": [], + "relevant_code": [], + "test_expectations": [], + "constraints": [], + "suggested_approach": "Proceed as planned.", + } + if open_questions is not None: + payload["open_questions"] = open_questions + return WorkerResult( + payload=payload, + raw_output="{}", + duration_ms=5, + cost_usd=0.0, + session_id="prep-mock", + ) + + +# (a) blocking question + clarify enabled → AWAITING_HUMAN +def test_prep_blocking_question_halts_at_awaiting_human( + tmp_path: Path, monkeypatch +) -> None: + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + _mock_env(monkeypatch) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + + blocking = [ + { + "severity": "blocking", + "question": "Which auth library should we use?", + }, + { + "severity": "blocking", + "question": "Should the API be REST or GraphQL?", + "assumption": "REST is the safer default.", + }, + ] + worker = _make_prep_worker(open_questions=blocking) + + with patch("megaplan.handlers._run_worker", return_value=(worker, "claude", "ephemeral", True)): + response = handle_prep(root, _make_args(plan_name, project_dir, plan=plan_name)) + + assert response["state"] == "awaiting_human_verify" + assert response["summary"].startswith("Prep halted:") + assert "2 blocking questions" in response["summary"] + + state = megaplan._core.read_json(plan_dir / "state.json") + clarification = state.get("clarification") + assert clarification is not None + assert clarification["source"] == "prep" + assert len(clarification["questions"]) == 2 + assert "[blocking] Which auth library should we use?" in clarification["questions"] + assert "[blocking] Should the API be REST or GraphQL?" in clarification["questions"] + assert "resume-clarify" in clarification["intent_summary"] + + # prep.json should still contain open_questions + prep = json.loads((plan_dir / "prep.json").read_text(encoding="utf-8")) + assert prep.get("open_questions") == blocking + + +# (b) --no-prep-clarify → PREPPED, blocking question still in prep.json +def test_prep_blocking_question_no_prep_clarify_proceeds_to_prepped( + tmp_path: Path, monkeypatch +) -> None: + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + _mock_env(monkeypatch) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + + # disable prep_clarify + state = megaplan._core.read_json(plan_dir / "state.json") + state["config"]["prep_clarify"] = False + megaplan._core.atomic_write_json(plan_dir / "state.json", state, _plan_dir=plan_dir) + + blocking = [{"severity": "blocking", "question": "Which auth library should we use?"}] + worker = _make_prep_worker(open_questions=blocking) + + with patch("megaplan.handlers._run_worker", return_value=(worker, "claude", "ephemeral", True)): + response = handle_prep(root, _make_args(plan_name, project_dir, plan=plan_name)) + + assert response["state"] == "prepped" + assert "Prep complete" in response["summary"] + + # blocking question must still be in prep.json + prep = json.loads((plan_dir / "prep.json").read_text(encoding="utf-8")) + assert prep.get("open_questions") == blocking + + # clarification must NOT be set + state = megaplan._core.read_json(plan_dir / "state.json") + assert "clarification" not in state + + +# (c) assume_and_proceed never halts under either setting +def test_prep_assume_and_proceed_never_halts( + tmp_path: Path, monkeypatch +) -> None: + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + _mock_env(monkeypatch) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + + assume_items = [ + { + "severity": "assume_and_proceed", + "question": "Which cache backend?", + "assumption": "Redis is fine for now.", + }, + { + "severity": "assume_and_proceed", + "question": "Which serialization format?", + "assumption": "JSON is adequate.", + }, + ] + worker = _make_prep_worker(open_questions=assume_items) + + with patch("megaplan.handlers._run_worker", return_value=(worker, "claude", "ephemeral", True)): + response = handle_prep(root, _make_args(plan_name, project_dir, plan=plan_name)) + + assert response["state"] == "prepped" + assert "Prep complete" in response["summary"] + + +# (c-alt) assume_and_proceed + blocking mix only halts on blocking when clarify enabled +def test_prep_mixed_severity_only_blocking_gates( + tmp_path: Path, monkeypatch +) -> None: + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + _mock_env(monkeypatch) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + + mixed = [ + {"severity": "assume_and_proceed", "question": "Which cache backend?", "assumption": "Redis."}, + {"severity": "blocking", "question": "Which auth library?"}, + ] + worker = _make_prep_worker(open_questions=mixed) + + with patch("megaplan.handlers._run_worker", return_value=(worker, "claude", "ephemeral", True)): + response = handle_prep(root, _make_args(plan_name, project_dir, plan=plan_name)) + + assert response["state"] == "awaiting_human_verify" + clarification = megaplan._core.read_json(plan_dir / "state.json").get("clarification") + assert len(clarification["questions"]) == 1 # only blocking + assert "[blocking] Which auth library?" in clarification["questions"][0] + + +# (f) resume-clarify from prep-sourced AWAITING_HUMAN succeeds +def test_override_resume_clarify_from_prep_source_succeeds( + tmp_path: Path, monkeypatch +) -> None: + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + _mock_env(monkeypatch) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + + # Simulate a prep-sourced AWAITING_HUMAN state + blocking = [{"severity": "blocking", "question": "Which auth library?"}] + worker = _make_prep_worker(open_questions=blocking) + + with patch("megaplan.handlers._run_worker", return_value=(worker, "claude", "ephemeral", True)): + handle_prep(root, _make_args(plan_name, project_dir, plan=plan_name)) + + state = megaplan._core.read_json(plan_dir / "state.json") + assert state["current_state"] == "awaiting_human_verify" + assert state["clarification"]["source"] == "prep" + + # Now resume via override resume-clarify + resume_response = megaplan.handle_override( + root, + _make_args( + plan_name, + project_dir, + plan=plan_name, + override_action="resume-clarify", + ), + ) + + assert resume_response["success"] is True + assert resume_response["state"] == "prepped" + assert "Prep clarification resolved" in resume_response["summary"] + + # state.json should be back to prepped + state = megaplan._core.read_json(plan_dir / "state.json") + assert state["current_state"] == "prepped" + + +# (f-reject) resume-clarify rejected when source != 'prep' +def test_override_resume_clarify_rejects_non_prep_source( + tmp_path: Path, monkeypatch +) -> None: + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + + # Manually set state to AWAITING_HUMAN with a non-prep clarification source + state = megaplan._core.read_json(plan_dir / "state.json") + state["current_state"] = "awaiting_human_verify" + state["clarification"] = { + "intent_summary": "Criteria verification needed.", + "questions": ["Is the plan acceptable?"], + "source": "criteria", + } + megaplan._core.atomic_write_json(plan_dir / "state.json", state, _plan_dir=plan_dir) + + with pytest.raises(megaplan.CliError, match="resume-clarify can only resume a prep-sourced"): + megaplan.handle_override( + root, + _make_args( + plan_name, + project_dir, + plan=plan_name, + override_action="resume-clarify", + ), + ) + + +# (f-reject2) resume-clarify rejected when not in AWAITING_HUMAN +def test_override_resume_clarify_rejects_wrong_state( + tmp_path: Path, monkeypatch +) -> None: + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + + # State is initialized — not awaiting_human + state = megaplan._core.read_json(plan_dir / "state.json") + assert state["current_state"] == "initialized" + + with pytest.raises(megaplan.CliError, match="resume-clarify requires state"): + megaplan.handle_override( + root, + _make_args( + plan_name, + project_dir, + plan=plan_name, + override_action="resume-clarify", + ), + ) + + +# (h) [defaults] prep_clarify=false makes a fresh init disable clarify absent the CLI flag +def test_init_prep_clarify_defaults_to_true_when_absent( + tmp_path: Path, monkeypatch +) -> None: + """Fresh init without --no-prep-clarify should leave prep_clarify absent (= True).""" + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + + init_response = megaplan.handle_init(root, _make_args(None, project_dir)) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + state = megaplan._core.read_json(plan_dir / "state.json") + + # prep_clarify should be absent (meaning True via .get("prep_clarify", True)) + assert "prep_clarify" not in state["config"] + + +def test_init_prep_clarify_false_via_no_prep_clarify_flag( + tmp_path: Path, monkeypatch +) -> None: + """--no-prep-clarify flag must write prep_clarify=False to config.""" + root = tmp_path / "root" + project_dir = tmp_path / "project" + config_path = tmp_path / "config" + root.mkdir() + project_dir.mkdir() + (project_dir / ".git").mkdir() + + monkeypatch.setattr(io_module, "config_dir", lambda home=None: config_path) + monkeypatch.setattr(megaplan.cli, "config_dir", lambda home=None: config_path) + + init_response = megaplan.handle_init( + root, _make_args(None, project_dir, prep_clarify=False) + ) + plan_name = init_response["plan"] + plan_dir = megaplan.plans_root(root) / plan_name + state = megaplan._core.read_json(plan_dir / "state.json") + + assert state["config"].get("prep_clarify") is False diff --git a/tests/test_prep_research.py b/tests/test_prep_research.py index 9d147367..7494967d 100644 --- a/tests/test_prep_research.py +++ b/tests/test_prep_research.py @@ -595,3 +595,50 @@ def run_conversation(self, *, user_message: str, conversation_history: object = assert result["prompt_tokens"] == 5 assert result["completion_tokens"] == 6 assert result["total_tokens"] == 11 + + +# --------------------------------------------------------------------------- +# T12(g): _compatible_prep_payload preserves open_questions +# --------------------------------------------------------------------------- + + +def test_compatible_prep_payload_preserves_open_questions() -> None: + """open_questions must survive the _compatible_prep_payload filter.""" + payload = { + "skip": False, + "task_summary": "summary", + "key_evidence": [], + "relevant_code": [], + "test_expectations": [], + "constraints": [], + "suggested_approach": "approach", + "open_questions": [ + {"severity": "blocking", "question": "Which auth library?"}, + {"severity": "assume_and_proceed", "question": "Which cache?", "assumption": "Redis."}, + ], + # Extra key not in PREP_COMPATIBLE_KEYS + "extra_field": "should be stripped", + } + result = prep_research._compatible_prep_payload(payload) + assert "open_questions" in result + assert result["open_questions"] == payload["open_questions"] + assert "extra_field" not in result + # Verify other keys are preserved + assert result["skip"] is False + assert result["task_summary"] == "summary" + + +def test_compatible_prep_payload_handles_absent_open_questions() -> None: + """Payload without open_questions should pass through unchanged (minus extra keys).""" + payload = { + "skip": False, + "task_summary": "summary", + "key_evidence": [], + "relevant_code": [], + "test_expectations": [], + "constraints": [], + "suggested_approach": "approach", + } + result = prep_research._compatible_prep_payload(payload) + assert "open_questions" not in result + assert result["skip"] is False diff --git a/tests/test_profile_smoke.py b/tests/test_profile_smoke.py index b18c014b..c94d1e82 100644 --- a/tests/test_profile_smoke.py +++ b/tests/test_profile_smoke.py @@ -296,24 +296,28 @@ def test_apex_vendor_and_critic_are_silent_noops( def test_with_prep_adds_prep_to_standard_workflow() -> None: """At --robustness standard, STATE_INITIALIZED normally jumps - straight to plan. --with-prep should restore the prep transition.""" + straight to plan. --with-prep should restore the prep transition(s). + prep can land in PREPPED or AWAITING_HUMAN, so there may be multiple + transitions — all must have next_step == 'prep'.""" workflow = _workflow_for_robustness("standard", with_prep=True) transitions = workflow[STATE_INITIALIZED] next_steps = [t.next_step for t in transitions] - assert next_steps == ["prep"], ( + assert next_steps and set(next_steps) == {"prep"}, ( f"with_prep=True at standard should run prep first; got {next_steps}" ) def test_with_prep_adds_prep_to_light_workflow() -> None: workflow = _workflow_for_robustness("light", with_prep=True) - assert [t.next_step for t in workflow[STATE_INITIALIZED]] == ["prep"] + next_steps = [t.next_step for t in workflow[STATE_INITIALIZED]] + assert next_steps and set(next_steps) == {"prep"} def test_with_prep_is_noop_at_robust() -> None: """robust already includes prep; with_prep=True must not break it.""" workflow = _workflow_for_robustness("robust", with_prep=True) - assert [t.next_step for t in workflow[STATE_INITIALIZED]] == ["prep"] + next_steps = [t.next_step for t in workflow[STATE_INITIALIZED]] + assert next_steps and set(next_steps) == {"prep"} def test_without_with_prep_standard_skips_prep() -> None: diff --git a/tests/test_schemas.py b/tests/test_schemas.py index eec54db6..00d139de 100644 --- a/tests/test_schemas.py +++ b/tests/test_schemas.py @@ -752,3 +752,92 @@ def test_complexity_missing_rejected() -> None: del payload["tasks"][0]["complexity"] errors = list(Draft7Validator(schema).iter_errors(payload)) assert len(errors) > 0 + + +# --------------------------------------------------------------------------- +# T12(d): open_questions schema validation (prep.json) +# --------------------------------------------------------------------------- + +def _minimal_prep_payload(**overrides: object) -> dict: + payload: dict[str, object] = { + "skip": False, + "task_summary": "Test prep.", + "key_evidence": [], + "relevant_code": [], + "test_expectations": [], + "constraints": [], + "suggested_approach": "Proceed.", + } + payload.update(overrides) + return payload + + +def test_prep_schema_validates_populated_open_questions() -> None: + schema = SCHEMAS["prep.json"] + payload = _minimal_prep_payload( + open_questions=[ + { + "severity": "blocking", + "question": "Which auth library?", + }, + { + "severity": "assume_and_proceed", + "question": "Which cache backend?", + "assumption": "Redis is fine.", + }, + ], + ) + errors = list(Draft7Validator(schema).iter_errors(payload)) + assert errors == [], f"Expected no errors, got: {errors}" + + +def test_prep_schema_validates_absent_open_questions() -> None: + schema = SCHEMAS["prep.json"] + payload = _minimal_prep_payload() + assert "open_questions" not in payload + errors = list(Draft7Validator(schema).iter_errors(payload)) + assert errors == [], f"Expected no errors, got: {errors}" + + +def test_prep_schema_validates_empty_open_questions() -> None: + schema = SCHEMAS["prep.json"] + payload = _minimal_prep_payload(open_questions=[]) + errors = list(Draft7Validator(schema).iter_errors(payload)) + assert errors == [], f"Expected no errors, got: {errors}" + + +def test_prep_schema_rejects_open_question_missing_severity() -> None: + schema = SCHEMAS["prep.json"] + payload = _minimal_prep_payload( + open_questions=[{"question": "Missing severity."}], + ) + errors = list(Draft7Validator(schema).iter_errors(payload)) + assert len(errors) > 0 + + +def test_prep_schema_rejects_open_question_missing_question() -> None: + schema = SCHEMAS["prep.json"] + payload = _minimal_prep_payload( + open_questions=[{"severity": "blocking"}], + ) + errors = list(Draft7Validator(schema).iter_errors(payload)) + assert len(errors) > 0 + + +def test_prep_schema_rejects_open_question_invalid_severity() -> None: + schema = SCHEMAS["prep.json"] + payload = _minimal_prep_payload( + open_questions=[{"severity": "critical", "question": "Bad severity."}], + ) + errors = list(Draft7Validator(schema).iter_errors(payload)) + assert len(errors) > 0 + + +def test_prep_schema_validates_open_questions_without_assumption() -> None: + """assumption is optional — only severity and question are required.""" + schema = SCHEMAS["prep.json"] + payload = _minimal_prep_payload( + open_questions=[{"severity": "blocking", "question": "No assumption field."}], + ) + errors = list(Draft7Validator(schema).iter_errors(payload)) + assert errors == [], f"Expected no errors, got: {errors}"