Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 87 additions & 0 deletions briefs/prep-clarification-escape-hatch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Prep clarification escape hatch — surface blocking ambiguities to the human

Spec shorthand: `partnered/full` (default depth). Codebase touchpoints already mapped below — no `--with-prep` needed.

## Outcome

Give the `prep` phase a first-class way to (1) record open questions / ambiguities structurally as a category distinct from findings, and (2) **optionally halt the run and hand genuinely blocking questions back to a human** instead of silently guessing. The halt behavior is **default-ON** ("prep may ask"); a flag opts out, and the decision doc tells users to opt out on cloud / unattended runs where no human is present to answer.

A reviewer checks: a prep run that hits a blocking ambiguity pauses the plan into `AWAITING_HUMAN` with the questions surfaced; the same run with the opt-out flag proceeds, recording the questions as explicit assumptions; non-blocking questions never halt under either setting.

## Background — why

Today prep already *notices* ambiguity: `_assemble_prep_outputs` writes `gap_notes` and `contradiction_notes` into `prep_metrics.json`, and the distill prompt instructs the model to populate them "when the findings disagree, time out, error, or leave concrete uncertainty." **But those notes live only in the metrics sidecar** — `distill_prep()` filters its payload down to `PREP_COMPATIBLE_KEYS`, which drops them, so nothing downstream (`plan`) ever reads them. Net effect: prep sees the ambiguity, writes it to a file nobody reads, and the planner proceeds to guess. This is the known "prep silently guesses scope instead of failing loud" gap.

The halt-and-ask primitive already exists but is wired only to the plan phase: `STATE_AWAITING_HUMAN` (an automation-terminal state the auto-driver cleanly pauses on) plus the `ClarificationRecord` type (`refined_idea / intent_summary / questions`) stored at `state["clarification"]`. This work threads prep into that existing machinery — it does not invent a new state, a new return channel, or a new interactive loop.

## Scope (IN)

1. **Structured `open_questions` in prep output.** Add an `open_questions[]` field to the prep payload, carried all the way to `plan` (i.e. added to `PREP_COMPATIBLE_KEYS` and the `prep.json` schema). Each entry is classified by severity:
- `blocking` — a genuine ambiguity that changes the plan and the model cannot responsibly resolve alone.
- `assume_and_proceed` — uncertainty the model resolved by making an explicit, recorded assumption (carries an `assumption` string).
Source these from the existing `gap_notes` / `contradiction_notes` signal the distill step already produces; do not build a parallel detection mechanism.

2. **Distill prompt update.** Instruct the distill step to emit `open_questions`, classify each by severity, and for `assume_and_proceed` state the assumption it made. Keep the existing gap/contradiction notes in the metrics sidecar as-is.

**Consumption disposition (mirror the critique "flags are hypotheses, not verdicts" stance).** Whatever consumes the prep output — the `plan` prompt, and the operator reading a paused run — must be told explicitly that **not everything prep flags is meaningful or relevant**, and to apply discretion about what to act on. A prep `open_question` (blocking or not) is a *candidate concern*, not a mandate: the planner weighs each on its merits, acts on the material ones, and dismisses the immaterial / irrelevant / already-resolved ones with a reason. Question *count* is not itself a signal. Add this framing to the `plan` prompt where it ingests prep output (and to the surfaced text on a paused run), parallel to how the existing critique guidance frames flags. This applies even to `blocking` questions: a human answering a paused run should understand they may legitimately judge a flagged "blocker" to be a non-issue and resume.

3. **Default-ON "may ask" halt.** When prep finishes with ≥1 `blocking` question **and** clarify is enabled (the default), transition the plan to `STATE_AWAITING_HUMAN`, writing the blocking questions into `ClarificationRecord.questions` (reuse the existing type and `state["clarification"]` slot). Add the required `prep → AWAITING_HUMAN` transition to the workflow state machine. Non-blocking questions never trigger this.

4. **Opt-out flag, default enabled.** Add `--no-prep-clarify` (disables the halt). Default = enabled (prep may ask). Persist the setting into `state.config` at `init` so the prep handler reads it. Wire it through `init` and accept it as a per-milestone field for `chain`. Optionally honor a `[defaults]` config key (e.g. `prep_clarify = false`) so a cloud box can opt out once — nice-to-have, not required.

5. **Auto-driver paused outcome surfaces the questions.** The auto-driver already exits cleanly on `AWAITING_HUMAN`; ensure its paused `DriverOutcome.reason` (or log) includes the blocking questions so the operator sees *what* to answer without digging into state.json.

6. **Resume path documented (reuse existing verbs).** The human answers by injecting guidance via the existing `override add-note`, then resumes the plan. Document this loop. Do **not** build a new `clarify --answer` command.

7. **Decision-doc update** (`megaplan/data/decision_skill.md`, the symlinked source behind the `megaplan-decision` skill). Document the default-ON behavior, the `--no-prep-clarify` flag, and explicit guidance: **opt out on cloud / unattended runs** (no human to answer → a blocking question would just strand the run at `AWAITING_HUMAN`). Place it in the Prep optional-phase section and add the modifier to the flag list.

## Scope (OUT) / anti-scope

- **Do not** add a new interactive CLI command for answering questions; reuse `override add-note` + resume.
- **Do not** add cloud auto-detection — the cloud opt-out is *documented guidance* + the flag, not runtime detection.
- **Do not** make the auto-driver block/wait on input; it must keep its non-interactive contract (pause-and-exit only).
- **Do not** rework the plan-phase clarification path beyond reusing `ClarificationRecord`.
- **Do not** redesign the gap/contradiction capture in `prep_metrics.json`; source from it, leave it intact.
- **Do not** change prep's 3-step pipeline shape (triage → fan-out → distill) or its model routing.

## Locked decisions

- Reuse `STATE_AWAITING_HUMAN` and `ClarificationRecord` — no new state or type.
- Halt is **default-ON**; opt-out flag is `--no-prep-clarify`.
- Severity gating: only `blocking` halts; `assume_and_proceed` rides forward as a recorded assumption visible to `plan`.
- Cloud opt-out is documentation + flag, not auto-detection.
- Resume = `override add-note` + resume; no new command.

## Open questions for the planner

- Exact field name/shape for severity classification and the assumption string within `open_questions[]` — pick something consistent with existing prep schema conventions.
- Whether the `[defaults].prep_clarify` config key is worth including now or deferred — planner's call; it's optional.

## Constraints

- The auto-driver must remain strictly non-interactive (pause-and-exit, never block on stdin).
- Backward-compat: existing prep outputs / `prep.json` consumers must not break when `open_questions` is absent or empty (treat as no questions).
- A run with `--no-prep-clarify` must behave exactly like today's halt-free flow, except that blocking questions are now recorded (as assumptions/notes) rather than dropped.

## Done criteria

- Tests:
- prep yielding a `blocking` question with clarify enabled → plan transitions to `AWAITING_HUMAN`, questions present in `ClarificationRecord`.
- same with `--no-prep-clarify` → plan proceeds past prep; blocking question recorded (not dropped, not halting).
- `assume_and_proceed` questions never cause a halt under either setting.
- `prep.json` schema validates `open_questions[]`; absence/empty is valid.
- auto-driver paused outcome/log includes the blocking question text.
- `decision_skill.md` updated with flag + cloud opt-out guidance.
- Existing prep/auto test suites still pass.

## Touchpoints (file:line from current tree)

- `megaplan/orchestration/prep_research.py` — `PREP_COMPATIBLE_KEYS` (~32–41), `distill_prep()` (~904–944), `_assemble_prep_outputs` gap/contradiction notes (~469–474), `prep.json` assembly (~1011, the `_artifact_json(plan_dir, "prep.json", …)` write).
- `megaplan/schemas/runtime.py` — the `prep.json` schema is defined inline here (~98–153, under the `"prep.json"` key); add `open_questions[]` there. (There is no `.megaplan/schemas/prep.json` file.)
- `megaplan/prompts/planning.py` — `_prep_distill_prompt` distill prompt (~475–531), and the `plan` prompt's prep-ingestion section (add the "use discretion; not every flag is meaningful" framing here, parallel to the existing critique-flag guidance).
- `megaplan/types.py` — `ClarificationRecord` (~130–133), state constants & `AUTOMATION_TERMINAL_STATES` (~13–37).
- `megaplan/_core/workflow_data.py` — transitions (~76–78); add `prep → AWAITING_HUMAN`.
- `megaplan/auto.py` — paused-outcome handling for `AWAITING_HUMAN` (~1143–1178).
- `megaplan/handlers/override.py` — `_override_add_note()` (~133–168) for the documented resume loop (reference only).
- CLI init + chain milestone parsing — `megaplan/cli.py` (existing prep flags ~L3114/L3710) and `megaplan/chain.py` (`_init_plan` ~L1441); wire `--no-prep-clarify` and per-milestone field; persist into `state.config`.
- `megaplan/data/decision_skill.md` — doc update.
18 changes: 17 additions & 1 deletion megaplan/_core/user_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,20 @@ def default_vendor(home: Path | None = None) -> str:
return "claude"


__all__ = ["VALID_VENDORS", "default_vendor", "load_user_config_toml"]
def default_prep_clarify(home: Path | None = None) -> bool:
"""Return the configured default for prep_clarify, falling back to ``True``.

Reads ``[defaults].prep_clarify`` from the user's ``config.toml``.
Only boolean values are accepted; anything else falls back to ``True``.
"""
data = load_user_config_toml(home)
defaults = data.get("defaults")
if not isinstance(defaults, dict):
return True
prep_clarify = defaults.get("prep_clarify")
if isinstance(prep_clarify, bool):
return prep_clarify
return True


__all__ = ["VALID_VENDORS", "default_prep_clarify", "default_vendor", "load_user_config_toml"]
2 changes: 2 additions & 0 deletions megaplan/_core/workflow_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ class Transition:
WORKFLOW: dict[str, list[Transition]] = {
STATE_INITIALIZED: [
Transition("prep", STATE_PREPPED),
Transition("prep", STATE_AWAITING_HUMAN),
],
STATE_PREPPED: [
Transition("plan", STATE_PLANNED),
Expand Down Expand Up @@ -75,6 +76,7 @@ class Transition:
],
STATE_AWAITING_HUMAN: [
Transition("verify-human", STATE_DONE),
Transition("resume-clarify", STATE_PREPPED),
],
STATE_TIEBREAKER_PENDING: [
Transition("tiebreaker-run", STATE_TIEBREAKER_READY),
Expand Down
36 changes: 34 additions & 2 deletions megaplan/auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -897,6 +897,19 @@ def _clear_orphaned_active_step(plan_dir: Path | None, expected_step: str) -> bo
return True


def _plan_clarification(plan_dir: Path | None) -> dict[str, Any] | None:
"""Read the ``clarification`` field from the plan state file, if any."""
if plan_dir is None:
return None
try:
state_data = json.loads((plan_dir / "state.json").read_text(encoding="utf-8"))
except (OSError, json.JSONDecodeError, ValueError):
return None
if not isinstance(state_data, dict):
return None
return state_data.get("clarification")


def drive(
plan: str,
*,
Expand Down Expand Up @@ -1141,12 +1154,31 @@ def _outcome(
# Terminal: plan reached a final state (or automation-terminal).
if state in AUTOMATION_TERMINAL_STATES and not (state == STATE_BLOCKED and valid_next):
if state == STATE_AWAITING_HUMAN:
log("plan awaiting human verification — automation stopping")
# Distinguish prep-sourced halts (blocking ambiguities) from
# criteria-verification halts. Prep halts include the blocking
# questions and the resume loop so the operator can act.
clarification = _plan_clarification(plan_dir)
if isinstance(clarification, dict) and clarification.get("source") == "prep":
questions = clarification.get("questions") or []
question_list = "\n".join(f" - {q}" for q in questions)
hint = (
"answer via `megaplan override add-note \"…\"` and resume via "
"`megaplan override resume-clarify`"
)
reason = (
f"prep surfaced {len(questions)} blocking "
f"ambiguities; a human may judge a flagged blocker a non-issue.\n"
f"blocking questions:\n{question_list}\n{hint}"
)
log(f"plan awaiting human clarification ({len(questions)} blocking questions) — automation stopping")
else:
log("plan awaiting human verification — automation stopping")
reason = "plan has criteria requiring human verification"
return _outcome(
"awaiting_human",
final_state=state,
iterations=iteration,
reason="plan has criteria requiring human verification",
reason=reason,
last_phase=last_phase,
)
if state == STATE_TIEBREAKER_PENDING:
Expand Down
13 changes: 13 additions & 0 deletions megaplan/chain.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ class MilestoneSpec:
deepseek_provider: str | None = None
with_prep: bool = False
with_feedback: bool = False
prep_clarify: bool = True
prep_direction: str | None = None
phase_model: list[str] = field(default_factory=list)
bakeoff: dict[str, Any] | None = None
Expand Down Expand Up @@ -211,6 +212,13 @@ def from_dict(cls, raw: dict[str, Any], index: int) -> "MilestoneSpec":
)
with_prep = _optional_bool(raw, "with_prep", index=index)
with_feedback = _optional_bool(raw, "with_feedback", index=index)
prep_clarify_raw = raw.get("prep_clarify")
if prep_clarify_raw is None:
prep_clarify = True
elif isinstance(prep_clarify_raw, bool):
prep_clarify = prep_clarify_raw
else:
raise CliError("invalid_spec", f"milestones[{index}].prep_clarify must be a boolean")
prep_direction_raw = raw.get("prep_direction")
if prep_direction_raw is None:
prep_direction = None
Expand Down Expand Up @@ -252,6 +260,7 @@ def from_dict(cls, raw: dict[str, Any], index: int) -> "MilestoneSpec":
deepseek_provider=deepseek_provider,
with_prep=with_prep,
with_feedback=with_feedback,
prep_clarify=prep_clarify,
prep_direction=prep_direction,
phase_model=phase_model,
bakeoff=bakeoff,
Expand Down Expand Up @@ -1451,6 +1460,7 @@ def _init_plan(
deepseek_provider: str | None = None,
with_prep: bool = False,
with_feedback: bool = False,
prep_clarify: bool = True,
prep_direction: str | None = None,
phase_model: list[str] | None = None,
writer,
Expand Down Expand Up @@ -1480,6 +1490,8 @@ def _init_plan(
args.append("--with-prep")
if with_feedback:
args.append("--with-feedback")
if not prep_clarify:
args.append("--no-prep-clarify")
if prep_direction:
args.extend(["--prep-direction", prep_direction])
for override in phase_model or []:
Expand Down Expand Up @@ -1910,6 +1922,7 @@ def log(msg: str, **fields: Any) -> None:
deepseek_provider=milestone.deepseek_provider,
with_prep=milestone.with_prep,
with_feedback=milestone.with_feedback,
prep_clarify=milestone.prep_clarify,
prep_direction=milestone.prep_direction,
phase_model=milestone.phase_model,
writer=writer,
Expand Down
10 changes: 10 additions & 0 deletions megaplan/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -3122,6 +3122,15 @@ def _add_vendor_critic_args(parser: argparse.ArgumentParser) -> None:
"APIs, research-heavy briefs, or ambiguous requirements. "
"Redundant on --robustness thorough|extreme (no-op).",
)
parser.add_argument(
"--no-prep-clarify",
action="store_false",
dest="prep_clarify",
default=True,
help="Disable the prep clarification gate. When set, prep blocking "
"questions are recorded in prep.json but do not halt the run; "
"planning proceeds immediately. Default: enabled.",
)
parser.add_argument(
"--with-feedback",
action="store_true",
Expand Down Expand Up @@ -3807,6 +3816,7 @@ def build_parser() -> argparse.ArgumentParser:
"add-note",
"replan",
"recover-blocked",
"resume-clarify",
"set-robustness",
"set-profile",
"set-model",
Expand Down
Loading