Skip to content

Add wrong-location guardrails: PreToolUse hook + artifact validator#116

Open
KjellKod wants to merge 19 commits into
mainfrom
claude/review-intelligence-roadmap-i0my2
Open

Add wrong-location guardrails: PreToolUse hook + artifact validator#116
KjellKod wants to merge 19 commits into
mainfrom
claude/review-intelligence-roadmap-i0my2

Conversation

@KjellKod
Copy link
Copy Markdown
Owner

Summary

Ship two additive guardrails that ride out through the installer to every Quest user, targeting the top friction class in the eval data (wrong-branch / wrong-directory / wrong nested .quest/ edits — 55 "wrong approach" events across recent sessions).

  • A PreToolUse Edit|Write hook that prints branch + working directory before every edit.
  • A post-invocation artifact-path validator wired into workflow.md's Handoff File Polling pattern.

Important

Existing Quest installs with a customized .claude/settings.json will receive a .quest_updated sidecar on the next installer run and must merge the new PreToolUse Edit|Write block manually. The installer's merge-carefully mode is checksum-compare, not JSON-aware merge. The caveat is documented in AGENTS.md; the manifest contract test asserts the caveat string stays present.

Changes

  • Hook:
    • .claude/hooks/branch-dir-context.sh — prints [quest-context] branch=<name|no-git> dir=<path>; exits 0 in git, non-git, and detached-HEAD contexts; never blocks the wrapped tool call.
    • .claude/settings.json — additive PreToolUse Edit|Write entry; existing SessionStart and PostToolUse Write|Edit hooks preserved verbatim.
  • Validator:
    • scripts/quest_artifact_postflight.py — filesystem-only check against expected_artifacts_for_role(...). Five mismatch reasons (missing, outside_boundary, noncanonical_filename, nested_quest_path, path_traversal) plus defensive unsupported_role_or_phase. Non-zero exit on mismatch; appends one JSON line per mismatch to .quest/<id>/logs/path_compliance.log. Orchestrator policy is accepted_with_warnings (non-halting per kill criterion). Latency target <50 ms, regression cap <200 ms.
  • Workflow wire-in:
    • .skills/quest/delegation/workflow.md — new postflight step appended at the end of the Handoff File Polling pattern. Items 1–6 keep their numbers so the 10 hard-coded Handoff File Polling §5/§6 cross-references elsewhere in the doc don't drift.
  • Manifest:
    • .quest-manifest — both new files added under [copy-as-is] so downstream installs receive them.
  • Documentation:
    • AGENTS.md — "Wrong-location guardrails" section naming both guardrails, the disable mechanism, and the installer caveat.
    • ideas/README.md — Done Index row added; three active rows removed.
    • docs/quest-journal/wrong-location-guardrails_2026-05-18.md — quest journal with celebration data.
  • Archival:
    • ideas/2026-05-17-wrong-location-guardrails.md, ideas/2026-04-15-pretooluse-branch-dir-verification-hook.md, ideas/2026-04-15-subagent-path-constraints-hardening.md moved to ideas/archive/ via git mv.
  • Tests:
    • tests/unit/test_branch_dir_context_hook.sh — hook contexts (git, non-git, detached-HEAD), stdout format, exit-0 guarantee.
    • tests/unit/test_quest_artifact_postflight.py — 12 cases including latency target and regression cap behind pytest.mark.perf.
    • tests/unit/test_workflow_postflight_wireup.py — placement (end-of-section) + cross-reference count assertions (§5 × 6, §6 × 4).
    • tests/unit/test_manifest_guardrails_entries.py[copy-as-is] entries + AGENTS.md literals + merge-carefully posture for settings.json.
    • tests/unit/test_ideas_archive_guardrails_housekeeping.py — archive paths exist + Done Index row points at the journal.
  • Config:
    • pyproject.toml — registers perf marker so non-perf runs filter cleanly.
    • .ai/allowlist.json — every role pinned to claude for this quest (committed earlier in the branch).

Validation

  • Run the unit suite. Prerequisites: repo root with pyproject.toml. Action: python3 -m pytest tests/unit/ -q -m "not perf". Expected: 584 tests pass, 0 failures. Then python3 -m pytest tests/unit/test_quest_artifact_postflight.py -q -m perf for the two latency tests (<50 ms typical and <200 ms stress).
  • Run the hook shell tests. Action: bash tests/unit/test_branch_dir_context_hook.sh. Expected: 5 cases pass — git repo with a named branch, non-git directory, detached-HEAD, stdout format check, exit-0 guarantee.
  • Exercise the hook live. Prerequisites: this branch checked out. Action: trigger any Edit or Write tool call (or run bash .claude/hooks/branch-dir-context.sh directly). Expected: one line [quest-context] branch=claude/review-intelligence-roadmap-i0my2 dir=/home/user/quest (or equivalent) printed to stdout. The wrapped tool call is unaffected.
  • Exercise the validator. Prerequisites: the working tree. Action: python3 scripts/quest_artifact_postflight.py --quest-dir .quest/wrong-location-guardrails_2026-05-18__0003 --role planner --phase plan (substitute any role/phase). Expected: exit 0 (no mismatches against expected_artifacts_for_role). Then run with a fake declared artifact pointing outside .quest/: expected non-zero exit and a JSON record appended to .quest/wrong-location-guardrails_2026-05-18__0003/logs/path_compliance.log with reason: "outside_boundary".
  • Verify workflow placement and cross-reference counts. Action: grep -n "quest_artifact_postflight" .skills/quest/delegation/workflow.md (expect the line to be after the "Three-tier fallback ladder" header at ~line 143) and grep -c "Handoff File Polling §5" .skills/quest/delegation/workflow.md (expect 6) and grep -c "Handoff File Polling\*\* §6" .skills/quest/delegation/workflow.md (expect 4).

Watch for: a customized .claude/settings.json in your install — the installer will not automatically merge the new PreToolUse Edit|Write entry into it. Pick up the new block from the .quest_updated sidecar by hand.

Notes

  • Plan history: 3 plan iterations, 0 fix iterations. Reviewer B caught two real defects during planning — (1) the installer merge-carefully gap that would have shipped the hook to no one with a customized settings.json, and (2) the 10-cross-reference rot that mid-list step insertion would have caused in workflow.md. Both were verified empirically (line numbers, grep counts) before the arbiter ruled. Final iteration approved with 0 findings from both code reviewers.
  • Validator wire-in is doc-only. No new Python module auto-invokes run(...). Orchestrators read the workflow doc and call the CLI. Auto-invocation is logged as a follow-up backlog item.
  • Failure policy is accepted_with_warnings (non-halting). CLI exits non-zero, structured log persists, orchestrator surfaces a warning. Tighten to halting only once false-positive rate is empirically zero in the field.
  • 3 deferred findings were appended to .quest/backlog/deferred_findings.jsonl for the next quest touching the postflight validator (docstring drift, fragile test heuristic, --quest-mode help-text polish).
     ▐▛███▜▌
    ▝▜█████▛▘
      ▘▘ ▝▝
Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 
in collaboration with the repository author

Generated by Claude Code

@KjellKod KjellKod marked this pull request as ready for review May 18, 2026 03:31
@KjellKod KjellKod temporarily deployed to codex-ci-review May 18, 2026 03:31 — with GitHub Actions Inactive
Comment thread .ai/allowlist.json Outdated
Comment thread scripts/quest_artifact_postflight.py Outdated
KjellKod pushed a commit that referenced this pull request May 18, 2026
Two PR #116 review findings:

1) Validator silently accepted missing or misplaced canonical artifacts
   (Codex inline, scripts/quest_artifact_postflight.py:214). The loop
   iterated declared_paths only, so an "artifacts": [] handoff returned 0
   without recording mismatches, and a canonical-named file at a misplaced
   sub-path inside the role boundary (e.g. phase_01_plan/nested/plan.md)
   passed every per-path check.

   Fix: in run(), compare resolved expected_paths to resolved declared_paths
   before the per-declared loop. For each expected canonical path that is
   not declared or not present on disk at the canonical location, emit a
   "missing" mismatch record. Remove the existence check from _check_one
   to avoid double-emitting when a declared canonical path is also flagged
   by the coverage step. AC4 ("mismatches cause non-zero exit and are not
   silently accepted") now holds for the empty-artifacts and misplaced
   cases as well.

   Three new tests cover the failure modes:
   - empty artifacts array for a role with expected paths
   - omitted canonical artifact
   - misplaced canonical-named file inside the phase boundary

2) Revert .ai/allowlist.json model pins. The all-Claude pin was scoped to
   the wrong-location-guardrails quest only; the per-quest snapshot at
   .quest/<id>/logs/allowlist_snapshot.json preserves that record. Keeping
   the global pin on main would change repo-default Quest routing for
   every checkout. Restored the prior mixed-model defaults.

Tests: 587 unit + 2 perf (was 584 + 2). Both new mismatch behaviors
covered.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 18 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".skills/quest/delegation/workflow.md">

<violation number="1" location=".skills/quest/delegation/workflow.md:184">
P2: The postflight example uses a generic `handoff.json` path that is incorrect for several role-specific handoff filenames.</violation>
</file>

<file name=".claude/hooks/branch-dir-context.sh">

<violation number="1" location=".claude/hooks/branch-dir-context.sh:20">
P2: Escape newlines in the working directory before printing; otherwise a valid path containing `\n`/`\r` breaks the hook's single-line stdout contract.</violation>
</file>

<file name="tests/unit/test_branch_dir_context_hook.sh">

<violation number="1" location="tests/unit/test_branch_dir_context_hook.sh:1">
P2: This test file is not wired into the repo’s test runner, so CI will never execute it. Add a shell-test step or convert it to a pytest module so the new coverage actually runs.</violation>
</file>

<file name="tests/unit/test_workflow_postflight_wireup.py">

<violation number="1" location="tests/unit/test_workflow_postflight_wireup.py:70">
P2: Scope this check to the actual postflight step, not any filename mention in the file.</violation>

<violation number="2" location="tests/unit/test_workflow_postflight_wireup.py:135">
P2: Don't end the scan on the first bold subheading; it truncates the artifact-preparation block early.</violation>
</file>

<file name="scripts/quest_artifact_postflight.py">

<violation number="1" location="scripts/quest_artifact_postflight.py:136">
P2: Catch `OSError`, `UnicodeDecodeError`, and `json.JSONDecodeError` when reading the handoff file, and ensure `data` is a dictionary before calling `.get()`. This avoids unhandled exceptions (e.g., `FileNotFoundError` or `AttributeError`) that could crash the postflight validator if the orchestrator text-fallback didn't write a valid JSON file.

(Based on your team's feedback about catching OSError and UnicodeDecodeError when reading UTF-8 files to avoid crashing quest completion.) [FEEDBACK_USED]</violation>
</file>

Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic

Comment thread .skills/quest/delegation/workflow.md Outdated
Comment thread .claude/hooks/branch-dir-context.sh Outdated
Comment thread tests/unit/test_branch_dir_context_hook.sh
Comment thread tests/unit/test_workflow_postflight_wireup.py Outdated
Comment thread tests/unit/test_workflow_postflight_wireup.py Outdated
Comment thread scripts/quest_artifact_postflight.py Outdated
KjellKod pushed a commit that referenced this pull request May 18, 2026
Capture the design and analysis surfaced by PR #116 review: today the only
place to set per-role model assignments is the shared .ai/allowlist.json,
so a one-quest override either ships globally or has to be reverted by
hand after the quest closes (the exact friction Codex flagged on PR #116).

The proposed design lets each quest carry its own active orchestration:

- After preflight + route selection but before folder creation, show the
  current models block and offer a one-keystroke confirm or a short
  per-role chooser.
- Write the chosen models block into a per-quest orchestration.json
  alongside the existing allowlist snapshot.
- Make every workflow.md role-dispatch site read from
  .quest/<id>/orchestration.json instead of .ai/allowlist.json.
- Resume invocations never re-prompt; the chosen orchestration is locked
  at quest start.

The idea doc includes my analysis (feasibility, usefulness, honest
counter-position) and a recommended quest prompt for when it is ready
to implement.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".ai/allowlist.json">

<violation number="1">
P2: This change introduces inconsistent model routing by switching several roles from `claude` to `gpt-5.5`, which conflicts with the stated allowlist pinning and can cause non-deterministic behavior across quest phases.</violation>
</file>

<file name="scripts/quest_artifact_postflight.py">

<violation number="1" location="scripts/quest_artifact_postflight.py:228">
P2: Comparing coverage with `Path.resolve()` lets a symlinked artifact satisfy multiple expected outputs, so an omitted canonical file can still pass validation.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic

Comment thread scripts/quest_artifact_postflight.py Outdated
KjellKod pushed a commit that referenced this pull request May 18, 2026
Six P2 findings from the cubic AI reviewer, all valid:

- workflow.md postflight invocation example used a generic handoff.json
  path. Most sub-agent roles write role-suffixed handoff files
  (handoff_plan-reviewer-a.json, handoff_arbiter.json, handoff_fixer.json,
  etc.). The example now uses <expected_handoff_path_for_role> with a
  pointer back to the role-to-filename table earlier in the same section.

- branch-dir-context.sh single-line stdout contract did not survive
  paths containing LF/CR (legal in POSIX path components). Collapse
  embedded LF/CR to spaces with `tr` before formatting the output line
  so the consumer still sees exactly one line.

- tests/unit/test_branch_dir_context_hook.sh was never invoked by the
  test job because pytest does not collect shell files. Added
  tests/unit/test_branch_dir_context_hook_runner.py that subprocesses
  bash to execute the shell suite and surfaces failure as a pytest
  failure with captured stdout/stderr.

- test_workflow_postflight_wireup.py #1 (references_postflight_script)
  asserted only that the literal appeared anywhere in workflow.md. Now
  scoped to occurrences inside the Handoff File Polling section so a
  stray comment elsewhere can't satisfy the wire-in contract.

- test_workflow_postflight_wireup.py #5
  (no_pre_invocation_postflight_reference) terminated its scan on the
  first bold subheading after the Artifact preparation bullet, which
  truncated the block early (e.g., the `**Codex sandbox permissions:**`
  sub-item closed the scan before reaching the bullet's tail). Switched
  to a numbered-list-item boundary (`^\d+\. `) so the scan covers the
  full bullet block until item 6 starts.

- quest_artifact_postflight.py _load_declared_artifacts could crash on
  unreadable handoff bytes or malformed JSON (FileNotFoundError,
  UnicodeDecodeError, JSONDecodeError) and on non-dict payloads
  (AttributeError on .get()). Wrapped both read paths and added a dict
  guard. Failure cases now return an empty declared list and the
  coverage check flags every expected canonical path as missing — the
  validator no longer crashes when the orchestrator falls back to text
  parsing.

- quest_artifact_postflight.py coverage check used Path.resolve(), which
  follows symlinks. Two distinct expected paths could collapse onto one
  declared symlink target, satisfying coverage for an omitted canonical
  artifact. Switched the identity comparison to
  os.path.normpath(os.path.abspath(...)) so distinct canonical paths
  remain distinct. Existence still uses .exists() (symlink-followed) —
  that's the right semantic for "is the file there?".

Tests: 588 unit (was 587, +1 shell-runner wrapper) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
@KjellKod KjellKod temporarily deployed to codex-ci-review May 18, 2026 03:55 — with GitHub Actions Inactive
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 5 files (changes from recent commits).

Requires human review: Auto-approval blocked by 1 unresolved issue from previous reviews.
Re-trigger cubic

Comment thread scripts/quest_artifact_postflight.py
@KjellKod KjellKod temporarily deployed to codex-ci-review May 18, 2026 04:36 — with GitHub Actions Inactive
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: This PR adds two new guardrails (a PreToolUse hook and a post-invocation artifact validator) that run on every Edit/Write tool call and every sub-agent handoff, which touches critical runtime paths; the 2688-line change includes new scripts, workflow logic, and configuration that require human...
Re-trigger cubic

Comment thread scripts/quest_artifact_postflight.py
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/quest_artifact_postflight.py">

<violation number="1" location="scripts/quest_artifact_postflight.py:373">
P1: This early return skips validation for every non-current-quest path, so missing workspace files and sibling `.quest/` artifacts are silently accepted.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic

Comment thread scripts/quest_artifact_postflight.py
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/quest_artifact_postflight.py">

<violation number="1" location="scripts/quest_artifact_postflight.py:384">
P2: Check for sibling-quest paths before the existence test so wrong-location writes are not downgraded to `missing`.</violation>

<violation number="2" location="scripts/quest_artifact_postflight.py:393">
P2: This `.quest` check is too broad: it will reject shared paths like `.quest/cache/...`, not just sibling quest directories.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic

Comment thread scripts/quest_artifact_postflight.py Outdated
Comment thread scripts/quest_artifact_postflight.py Outdated
@KjellKod KjellKod temporarily deployed to codex-ci-review May 18, 2026 05:06 — with GitHub Actions Inactive
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Requires human review: This PR modifies core orchestration workflow, hook configuration, and adds a new artifact validation script, all of which are high-impact changes that require human review to ensure correctness and avoid unintended side effects.
Re-trigger cubic

Comment thread scripts/quest_artifact_postflight.py Outdated
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="scripts/quest_artifact_postflight.py">

<violation number="1" location="scripts/quest_artifact_postflight.py:416">
P1: In worktree mode, shared `.quest` paths (like `cache/` or `backlog/`) and sibling quest artifacts will incorrectly trigger a `traversal_outside_repo` mismatch because they are validated against `workspace_root` instead of `repo_root`.

(Based on your team's feedback about Non-current-quest path validation with shared .quest allowlist.) [FEEDBACK_USED]</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic

Comment thread scripts/quest_artifact_postflight.py Outdated
@KjellKod KjellKod temporarily deployed to codex-ci-review May 18, 2026 05:32 — with GitHub Actions Inactive
cubic-dev-ai[bot]
cubic-dev-ai Bot previously approved these changes May 18, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 issues found across 2 files (changes from recent commits).

Auto-approved: These changes add two additive, non-halting guardrails—an observational pre-edit hook and an advisory artifact-path validator—backed by comprehensive tests, with no modifications to critical infrastructure, authentication, or core business logic, and the orchestrator treats any validator failure as
Re-trigger cubic

KjellKod pushed a commit that referenced this pull request May 23, 2026
Six P2 findings from the cubic AI reviewer, all valid:

- workflow.md postflight invocation example used a generic handoff.json
  path. Most sub-agent roles write role-suffixed handoff files
  (handoff_plan-reviewer-a.json, handoff_arbiter.json, handoff_fixer.json,
  etc.). The example now uses <expected_handoff_path_for_role> with a
  pointer back to the role-to-filename table earlier in the same section.

- branch-dir-context.sh single-line stdout contract did not survive
  paths containing LF/CR (legal in POSIX path components). Collapse
  embedded LF/CR to spaces with `tr` before formatting the output line
  so the consumer still sees exactly one line.

- tests/unit/test_branch_dir_context_hook.sh was never invoked by the
  test job because pytest does not collect shell files. Added
  tests/unit/test_branch_dir_context_hook_runner.py that subprocesses
  bash to execute the shell suite and surfaces failure as a pytest
  failure with captured stdout/stderr.

- test_workflow_postflight_wireup.py #1 (references_postflight_script)
  asserted only that the literal appeared anywhere in workflow.md. Now
  scoped to occurrences inside the Handoff File Polling section so a
  stray comment elsewhere can't satisfy the wire-in contract.

- test_workflow_postflight_wireup.py #5
  (no_pre_invocation_postflight_reference) terminated its scan on the
  first bold subheading after the Artifact preparation bullet, which
  truncated the block early (e.g., the `**Codex sandbox permissions:**`
  sub-item closed the scan before reaching the bullet's tail). Switched
  to a numbered-list-item boundary (`^\d+\. `) so the scan covers the
  full bullet block until item 6 starts.

- quest_artifact_postflight.py _load_declared_artifacts could crash on
  unreadable handoff bytes or malformed JSON (FileNotFoundError,
  UnicodeDecodeError, JSONDecodeError) and on non-dict payloads
  (AttributeError on .get()). Wrapped both read paths and added a dict
  guard. Failure cases now return an empty declared list and the
  coverage check flags every expected canonical path as missing — the
  validator no longer crashes when the orchestrator falls back to text
  parsing.

- quest_artifact_postflight.py coverage check used Path.resolve(), which
  follows symlinks. Two distinct expected paths could collapse onto one
  declared symlink target, satisfying coverage for an omitted canonical
  artifact. Switched the identity comparison to
  os.path.normpath(os.path.abspath(...)) so distinct canonical paths
  remain distinct. Existence still uses .exists() (symlink-followed) —
  that's the right semantic for "is the file there?".

Tests: 588 unit (was 587, +1 shell-runner wrapper) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
KjellKod pushed a commit that referenced this pull request May 23, 2026
PR #116 review feedback: the rationale for using bash parameter
expansion over ``tr`` invoked hypothetical PATH-restricted sandboxes
missing coreutils — YAGNI for the platforms we support. Keep the
builtin implementation; drop the justification prose.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
KjellKod pushed a commit that referenced this pull request May 23, 2026
PR #116 review (Codex, Must fix): the per-path boundary check used
``resolved.relative_to(boundary_dir)``, which accepts any depth under
the phase directory. A handoff declaring both the real canonical path
(satisfying coverage) and an off-spec ``phase_XX/nested/<canonical>.md``
slipped through — ``_check_one`` accepted the nested entry because it
was under boundary and had a canonical basename.

Require exact parent equality (``resolved.parent == boundary_dir``).
Expected canonical paths share the same phase dir by construction, so
the tighter invariant is consistent with the artifact contract.

Regression pin: ``test_extra_canonical_named_file_in_nested_subdir_
records_outside_boundary`` declares both the canonical and the nested
duplicate and asserts the nested declaration records
``outside_boundary``.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Codex <noreply@openai.com>
in collaboration with the repository author
KjellKod pushed a commit that referenced this pull request May 23, 2026
…inks

PR #116 review (Codex, Must fix): a canonical quest path implemented as
a symlink to a workspace file (e.g. ``pr_description.md`` -> ``scripts/
foo.py``) slipped through. ``_check_one`` classified by the resolved
target, treated the symlink as a workspace file, and accepted it; the
expected-identity comparison in coverage matched on both sides because
both resolve to the same target.

Two related fixes in ``_check_one``/``run``:

1) Classify by the DECLARED path's directory, not the resolved target.
   ``declared_path.parent.resolve() / declared_path.name`` canonicalizes
   the worktree ``.quest`` directory symlink while leaving any
   file-level symlink visible. A symlink at a canonical quest path is
   now classified as a quest artifact and reaches the boundary check.

2) Resolve ``boundary_dir``'s parent only, not the file. ``Path(
   expected_paths[0]).resolve().parent`` would follow a file-level
   symlink on the canonical path and corrupt the boundary to wherever
   the link pointed. Use ``Path(expected_paths[0]).parent.resolve()``
   so the boundary stays anchored to the phase directory regardless of
   whether the canonical artifact was provisioned as a real file or a
   symlink.

Regression pin: ``test_quest_artifact_symlink_escaping_boundary_records_
outside_boundary`` provisions ``pr_description.md`` as a symlink to
``<workspace>/scripts/foo.py``, declares it alongside a real second
canonical, and asserts ``outside_boundary`` on the symlink declaration.

Tests: 781 unit/integration green (one pre-existing dashboard test
failure on this branch is unrelated).

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Codex <noreply@openai.com>
in collaboration with the repository author
@KjellKod KjellKod force-pushed the claude/review-intelligence-roadmap-i0my2 branch from 881b1dd to 1823130 Compare May 23, 2026 00:16
@KjellKod KjellKod temporarily deployed to codex-ci-review May 23, 2026 00:18 — with GitHub Actions Inactive
KjellKod added a commit that referenced this pull request May 29, 2026
* Add per-quest orchestration override idea doc

Copy proposal from PR #116 branch onto orchestration-override for
review. Proposes promoting the per-quest allowlist snapshot to the
source of truth for role dispatch, with a startup confirm/override
step, so per-quest model choices no longer require editing the repo
allowlist.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod

* Add per-quest orchestration override

Promote per-quest config to source of truth for role-to-model
dispatch. A new .quest/<id>/orchestration.json (written by the
startup chooser) replaces .ai/allowlist.json as the read source at
every workflow.md dispatch site, so per-quest model choices no
longer require editing the repo allowlist.

- SKILL.md: new sub-step 8.5 (chooser + writer) in Quest Folder
  Creation, new sub-step 1a (resume migration from snapshot).
- workflow.md: 6 dispatch sites + 5 Codex MCP templates + mapping
  table footer + Slot A headers swapped to read from
  .quest/<id>/orchestration.json.
- quest_validate-quest-state.sh: new validate_orchestration_json
  asserting presence and active-mode role coverage at every
  phase transition. orchestration_compat: "legacy" opt-out
  scoped to in-flight quests predating this change.
- scripts/quest_runtime/orchestration.py: testable contract for
  chooser writer, override parser, migration, availability check.
- tests/test-quest-orchestration.sh (new, 12 cases) and 13 new
  cases in tests/test-validate-quest-state.sh cover the chooser
  parse contract, override validation, resume migration, and
  the grep contract on workflow.md.
- AGENTS.md and ideas/2026-05-18-per-quest-orchestration-override.md
  updated with closure notes.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod

* Archive orchestration-override quest journal

Quality tier: Platinum. Quest delivered the per-quest
orchestration override end-to-end: chooser, dispatch swap,
validator, runtime helper, 64+12 tests, and the legacy compat
vent for in-flight quests.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod

* Address orchestration review findings

* Validate orchestration snapshot shape

* Tighten orchestration fallback validation

* Remove orchestration validation bypass

* Handle Claude-family orchestration models

* Split orchestration runtime availability checks

* Align orchestration dispatch docs

* updated

* Clean orchestration journal whitespace

* Keep sharpen ux defaults portable

* Fix orchestration default availability handling

* updated

* Rewrite install-posture doc to cover both install modes

The doc argued outside-in was the default and in-repo a rare
exception, but the shipped installer makes in-repo the primary,
tooling-supported path. Reframe it to present both modes honestly
with trade-offs and a choose-when guide, add the standard YAML
frontmatter, and index it in the architecture README.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod <kjell.hedstrom@gmail.com>

* Ship .quest-manifest so installed repos stay in sync

The installer fetched .quest-manifest only to learn what to copy,
then discarded it, so target repos kept a stale manifest. Newly
installed skills and scripts were then flagged as missing from the
manifest by strict validation in the target repo.

List .quest-manifest in [copy-as-is] and special-case it to always
overwrite with upstream (no prompt, even in --force), since it is
bookkeeping that must match the installed file set. A locally
modified manifest is preserved as .quest-manifest.quest_updated so
an intentional customization stays recoverable. Add tests for the
overwrite+backup and pristine-update paths.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod <kjell.hedstrom@gmail.com>

* Clarify Quest ID timestamp handling

Guard against preformatted timestamp strings in the Quest ID helper and make the startup instruction pass datetime.now() explicitly.

Quest/Co-Authored by

Co-Authored-By: Codex <noreply@openai.com>

in collaboration with KjellKod <kjell.hedstrom@gmail.com>

* updated ideas

* Sharpen native-runtime-dispatch idea with Claude facts and slot model

Reframe the proposal so it matches how Quest actually dispatches today
and removes hedges that would invite theatre on the way to implementation.

- Claude context isolation: fresh context in is automatic (no
  fork_context equivalent — stronger than it); the real exposure is
  the Task tool result returning to the orchestrator. Containment is
  the compact-handoff-out contract, not an open research item.
- Note that native Claude Task dispatch is already the live path for
  Claude-family roles; this proposal codifies and guard-rails it.
- Add a slot/role-type/runtime/permission-key distinction so the A/B
  reviewer mixed-runtime case (the default, even) is documented as
  normal: runtime is resolved per slot, A+B can be any family mix, the
  orchestrator model does not constrain slot runtimes, and one shared
  subagent_type definition is reused across slots and families.
- State the permission posture plainly: role_permissions are not
  runtime-enforced today, native dispatch is enforcement-neutral.
  Cross-link the enforcement-activation roadmap.
- Require that orchestration.json's specific same-family model
  variants (e.g. claude-haiku, claude-opus-4.7) are passed explicitly
  rather than silently voided by model: inherit.
- Add a Documentation Updates Required section so user-facing docs are
  treated as definition-of-done, not a follow-up.
- Drop the speed framing; argue native because it is first-party and
  actively optimized by the model vendors, with bridges/MCP scoped to
  cross-family glue.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod <kjell.hedstrom@gmail.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
@KjellKod KjellKod deployed to codex-ci-review May 29, 2026 18:32 — with GitHub Actions Active
claude and others added 19 commits May 29, 2026 12:34
Consolidates two 2026-04-15 sub-ideas (PreToolUse branch/dir hook and
sub-agent artifact-path validation) into one shippable brief targeting
the top friction class in the eval data: wrong-branch / wrong-directory
edits. Brief includes goal, deliverables, acceptance criteria, explicit
out-of-scope, kill criteria, and a ready /quest prompt.

Sets every role in .ai/allowlist.json to claude so the upcoming
wrong-location-guardrails quest runs Claude-only end-to-end. Per the
file's own header, takes effect on next quest run.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod <kjell.hedstrom@gmail.com>
Flip status on the consolidated brief and the two superseded sub-ideas
from proposed to in-progress, add superseded_by pointers on the two
sub-ideas, and refresh the ideas/README.md index with the new row and
updated statuses.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod <kjell.hedstrom@gmail.com>
Add an explicit deliverable and acceptance criterion to the wrong-location
guardrails brief: when the quest ships, move the three in-progress idea
docs (the consolidated 2026-05-17 brief and the two superseded 2026-04-15
sub-ideas) into ideas/archive/, add a done-index row pointing at the
quest journal entry, and drop the active rows from the Execution
Discipline and Observability section so the index reflects shipped state.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with KjellKod <kjell.hedstrom@gmail.com>
Two additive guardrails that ship to every Quest install via .quest-manifest
and target the top friction class in the eval data (wrong-branch /
wrong-directory / wrong nested .quest/ edits, 55 "wrong approach" events
across recent sessions).

- .claude/hooks/branch-dir-context.sh: PreToolUse Edit|Write hook prints
  [quest-context] branch=<name|no-git> dir=<path>. Exit 0 in git, non-git,
  and detached-HEAD contexts; never blocks the wrapped tool call.
  .claude/settings.json wiring is strictly additive; existing SessionStart
  and PostToolUse hooks are preserved verbatim.

- scripts/quest_artifact_postflight.py: filesystem-only validator that
  compares declared handoff artifacts against expected_artifacts_for_role()
  boundaries. Five mismatch reasons (missing, outside_boundary,
  noncanonical_filename, nested_quest_path, path_traversal); exits non-zero
  on mismatch and appends a JSON record to .quest/<id>/logs/path_compliance
  .log. Orchestrator policy is accepted_with_warnings (non-halting) per the
  brief's kill criterion.

- .skills/quest/delegation/workflow.md: new postflight step appended at the
  END of the Handoff File Polling pattern. Items 1-6 keep their numbers so
  the 10 hard-coded "Handoff File Polling §5/§6" cross-references elsewhere
  in the doc don't drift. Two contract tests lock both the placement and
  the cross-reference counts (§5 x 6, §6 x 4) against future edits.

- AGENTS.md: Wrong-location guardrails section names both guardrails,
  documents how to disable each, and surfaces the installer caveat that
  customized .claude/settings.json files receive a .quest_updated sidecar
  and require a one-time manual merge to pick up the PreToolUse entry
  (merge-carefully is checksum compare, not JSON-aware merge).

- pyproject.toml: registers the perf marker so latency tests opt out
  cleanly under non-perf CI runs.

- Tests (tests/unit/): hook contexts, validator branches plus latency
  target (<50 ms) and regression cap (<200 ms), workflow contract +
  cross-reference counts, manifest entries + AGENTS.md literals, archival
  housekeeping. 584 unit tests pass; 2 perf tests pass.

- Archival: the three in-progress idea docs
  (2026-05-17-wrong-location-guardrails, 2026-04-15-pretooluse-branch-dir-
  verification-hook, 2026-04-15-subagent-path-constraints-hardening) move
  to ideas/archive/ via git mv; ideas/README.md Done Index gets a row
  pointing at docs/quest-journal/wrong-location-guardrails_2026-05-18.md.

Quest history: 3 plan iterations, 0 fix iterations. Reviewer B caught the
installer merge-carefully gap at iteration 1 and the 10-cross-reference
regression introduced by iteration 2's chosen placement; both were cross-
verified empirically before the arbiter ruled. Final iteration approved
with 0 findings from both code reviewers.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Two PR #116 review findings:

1) Validator silently accepted missing or misplaced canonical artifacts
   (Codex inline, scripts/quest_artifact_postflight.py:214). The loop
   iterated declared_paths only, so an "artifacts": [] handoff returned 0
   without recording mismatches, and a canonical-named file at a misplaced
   sub-path inside the role boundary (e.g. phase_01_plan/nested/plan.md)
   passed every per-path check.

   Fix: in run(), compare resolved expected_paths to resolved declared_paths
   before the per-declared loop. For each expected canonical path that is
   not declared or not present on disk at the canonical location, emit a
   "missing" mismatch record. Remove the existence check from _check_one
   to avoid double-emitting when a declared canonical path is also flagged
   by the coverage step. AC4 ("mismatches cause non-zero exit and are not
   silently accepted") now holds for the empty-artifacts and misplaced
   cases as well.

   Three new tests cover the failure modes:
   - empty artifacts array for a role with expected paths
   - omitted canonical artifact
   - misplaced canonical-named file inside the phase boundary

2) Revert .ai/allowlist.json model pins. The all-Claude pin was scoped to
   the wrong-location-guardrails quest only; the per-quest snapshot at
   .quest/<id>/logs/allowlist_snapshot.json preserves that record. Keeping
   the global pin on main would change repo-default Quest routing for
   every checkout. Restored the prior mixed-model defaults.

Tests: 587 unit + 2 perf (was 584 + 2). Both new mismatch behaviors
covered.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Six P2 findings from the cubic AI reviewer, all valid:

- workflow.md postflight invocation example used a generic handoff.json
  path. Most sub-agent roles write role-suffixed handoff files
  (handoff_plan-reviewer-a.json, handoff_arbiter.json, handoff_fixer.json,
  etc.). The example now uses <expected_handoff_path_for_role> with a
  pointer back to the role-to-filename table earlier in the same section.

- branch-dir-context.sh single-line stdout contract did not survive
  paths containing LF/CR (legal in POSIX path components). Collapse
  embedded LF/CR to spaces with `tr` before formatting the output line
  so the consumer still sees exactly one line.

- tests/unit/test_branch_dir_context_hook.sh was never invoked by the
  test job because pytest does not collect shell files. Added
  tests/unit/test_branch_dir_context_hook_runner.py that subprocesses
  bash to execute the shell suite and surfaces failure as a pytest
  failure with captured stdout/stderr.

- test_workflow_postflight_wireup.py #1 (references_postflight_script)
  asserted only that the literal appeared anywhere in workflow.md. Now
  scoped to occurrences inside the Handoff File Polling section so a
  stray comment elsewhere can't satisfy the wire-in contract.

- test_workflow_postflight_wireup.py #5
  (no_pre_invocation_postflight_reference) terminated its scan on the
  first bold subheading after the Artifact preparation bullet, which
  truncated the block early (e.g., the `**Codex sandbox permissions:**`
  sub-item closed the scan before reaching the bullet's tail). Switched
  to a numbered-list-item boundary (`^\d+\. `) so the scan covers the
  full bullet block until item 6 starts.

- quest_artifact_postflight.py _load_declared_artifacts could crash on
  unreadable handoff bytes or malformed JSON (FileNotFoundError,
  UnicodeDecodeError, JSONDecodeError) and on non-dict payloads
  (AttributeError on .get()). Wrapped both read paths and added a dict
  guard. Failure cases now return an empty declared list and the
  coverage check flags every expected canonical path as missing — the
  validator no longer crashes when the orchestrator falls back to text
  parsing.

- quest_artifact_postflight.py coverage check used Path.resolve(), which
  follows symlinks. Two distinct expected paths could collapse onto one
  declared symlink target, satisfying coverage for an omitted canonical
  artifact. Switched the identity comparison to
  os.path.normpath(os.path.abspath(...)) so distinct canonical paths
  remain distinct. Existence still uses .exists() (symlink-followed) —
  that's the right semantic for "is the file there?".

Tests: 588 unit (was 587, +1 shell-runner wrapper) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Codex flagged a false-positive in the coverage check introduced in
50d16e7: ``expected_artifacts_for_role(...)`` includes the role's own
handoff filename (planner → ``("plan.md", "handoff.json")``, builder →
``("pr_description.md", "builder_feedback_discussion.md", "handoff.json")``,
etc.), but the agent contracts in workflow.md do NOT ask roles to list
their own handoff file in ``handoff.artifacts``. Roles declare
deliverables there. The validator was therefore marking every
well-formed handoff as ``missing`` the handoff file itself — every real
run would have surfaced an ``accepted_with_warnings`` warning for an
undeclared envelope.

The empirical check: the in-tree builder handoff for the wrong-location
quest declares ``pr_description.md`` + ``builder_feedback_discussion.md``
+ touched source files, and not ``handoff.json``. With the prior
coverage logic, that valid handoff would have logged a missing record.

Fix: the validator is invoked on a specific ``--handoff`` path; that
file's existence and identity are implicit. Exclude its normalized
path from the coverage comparison. Per-declared checks still run for
the handoff if a role chooses to declare it.

Tests:
- ``test_empty_artifacts_array_fails_for_role_with_expected_paths``
  now expects 1 missing (``plan.md``) instead of 2; the handoff
  exclusion is the design.
- Replaced ``test_omitted_canonical_artifact_fails`` (which baked
  the bug into the suite by asserting handoff-omission is a failure)
  with ``test_omitted_canonical_deliverable_fails`` covering a
  builder that omits ``builder_feedback_discussion.md``.
- Added ``test_undeclared_handoff_file_does_not_record_missing``
  pinning the new behavior with the real-world shape (handoff
  omitted from ``artifacts``; validator returns 0).

Tests: 589 unit (was 588, +1 regression test) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Codex flagged the second false-positive in this validator: builder and
fixer ARTIFACTS contracts legitimately list changed source/test/doc
files alongside the canonical quest deliverables. The previous
``_check_one`` treated every declared path as a quest artifact and
applied the boundary + canonical-name checks, so a normal successful
build emitted ``outside_boundary`` for every workspace file.

The wrong-location-guardrails builder handoff itself declares 16 paths
outside ``phase_02_implementation/`` (the role boundary): the hook
script, settings.json, the validator, manifest, AGENTS.md, the README,
three archived idea docs, pyproject, five test files. Under the prior
contract, that valid handoff would have logged 16 mismatch records.

Fix: classify each declared path before applying per-path checks.

* Quest-artifact path — resolves under ``<repo_root>/.quest/<id>/``.
  Full check set: traversal, nested ``.quest``, role boundary, canonical
  filename. The brief's wrong-location intent applies here.
* Workspace-file path — anywhere else inside the repo. Only the
  traversal check applies. Boundary and canonical-name aren't meaningful
  for changed source files.

Coverage check (run-level) is unchanged: every canonical quest artifact
must still be declared. Workspace files don't affect coverage because
they're not in ``expected_artifacts_for_role(...)``.

Test impact:
* ``test_outside_boundary_records_outside_boundary`` previously declared
  ``<workspace>/src/plan.md`` (outside ``.quest/``) which under the new
  classification is a workspace file, no mismatch. The test was
  asserting the wrong contract. Rewrote it to declare
  ``.quest/<id>/phase_03_review/plan.md`` for a planner whose boundary
  is ``phase_01_plan/`` — still inside ``.quest/`` but wrong phase, so
  ``outside_boundary`` fires correctly.
* Added ``test_workspace_files_outside_quest_pass_through``: builder
  declares the same canonical-plus-workspace shape as the real
  wrong-location-guardrails handoff (16+ paths). Validator must return
  0 with no records. Regression pin against the 16-false-positive
  failure mode.
* Refreshed the module docstring to document the path classification
  and the full mismatch-reason set (the prior listing omitted
  ``unsupported_role_or_phase`` which the implementation can emit).

Tests: 590 unit (was 589, +1 workspace-pass-through regression test)
+ 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Cubic flagged two real gaps in d01f20c's workspace-vs-quest path
classification:

1) Workspace-file paths got traversal-only treatment, so a builder
   could declare ``scripts/foo.py`` without ever writing it and the
   validator would silently accept. That is the exact path-drift
   failure mode the brief targets.

2) Paths under ``.quest/`` belonging to a sibling quest's directory
   (``.quest/<OTHER-id>/foo.md``) were classified as workspace files
   and passed. Writing into another quest's space is the precise
   wrong-location failure mode this validator exists to catch.

Tightened the non-current-quest branch to match cubic's suggestion:

* Workspace-file path must exist on disk; missing → ``missing``.
* Path inside ``.quest/`` but not under our quest_root → sibling-quest
  cross-write → ``outside_boundary``.
* True workspace file (outside ``.quest/`` entirely, exists on disk)
  passes — unchanged.

Two new regression tests:

* ``test_workspace_file_missing_on_disk_records_missing`` — builder
  declares a workspace path that was never written. Validator records
  ``missing`` and exits non-zero.
* ``test_sibling_quest_artifact_records_outside_boundary`` — builder
  reaches into a sibling quest's directory. Validator records
  ``outside_boundary`` and exits non-zero. The sibling file is written
  to disk so the assertion can't be shadowed by ``missing``.

Module docstring refreshed to spell out the workspace + sibling-quest
semantics.

Tests: 592 unit (was 590, +2 regression pins) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Two follow-up gaps in 3822f6b's non-current-quest branch, both flagged by
cubic:

1) Sibling-quest detection ran AFTER the existence check, so a declared
   path under another quest's directory that happened not to exist was
   reported as ``missing`` rather than ``outside_boundary``. The real
   failure is the cross-quest declaration; whether the write actually
   happened is secondary. Reordered to check sibling-quest membership
   first.

2) The ``relative_to(repo_root / ".quest")`` test was too broad: it
   treated every path under ``.quest/`` as a sibling-quest cross-write,
   including legitimate shared infrastructure paths. Three paths are
   cross-quest by design and must pass:

   * ``.quest/cache/...`` — preflight stores host-verified bridge probes.
   * ``.quest/backlog/deferred_findings.jsonl`` — canonical review-
     intelligence deferred-findings log.
   * ``.quest/audit.log`` — persistent tool-call audit across quest runs.

   Empirically present in this repo: ``ls .quest/`` shows ``audit.log``
   and ``backlog/`` alongside the per-quest dir.

   ``archive`` is intentionally NOT shared: writing into
   ``.quest/archive/`` mid-build would be a real wrong-location failure,
   archival is the orchestrator's completion-time ceremony.

Added ``_SHARED_DOT_QUEST_NAMES`` frozenset and check the first segment
of the path's relative-to-``.quest/`` decomposition against it. Paths
under shared names take the workspace-file pass; paths under any other
name (quest-id-keyed sibling dirs) fire ``outside_boundary``.

Two new regression tests:

* ``test_sibling_quest_missing_on_disk_still_outside_boundary`` — pins
  the reorder. Builder declares a sibling-quest path that was never
  written. Validator MUST record ``outside_boundary``, NOT ``missing``.
* ``test_shared_dot_quest_paths_pass_through`` — pins the allowlist.
  Builder declares ``.quest/cache/...``, ``.quest/backlog/...``, and
  ``.quest/audit.log`` alongside canonical deliverables. Validator
  returns 0 with no records.

Tests: 594 unit (was 592, +2 regression pins) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Codex flagged a worktree-mode defect in the validator: ``.quest/<id>/``
lives in the original repo, but builder and fixer roles edit files in
``source_workspace_root`` (the worktree). The validator derived
``repo_root = quest_dir.parent.parent`` for everything and used it for
both quest-artifact and workspace-file checks. In worktree mode that
meant:

* A declared relative path like ``scripts/foo.py`` resolved against the
  original repo (where the file isn't), so existence failed.
* An absolute worktree path (``/path/to/worktree/scripts/foo.py``) sat
  outside the original repo's tree, so the traversal check fired
  ``traversal_outside_repo``.

Both are false positives on every well-formed worktree build.

Fix: thread a separate ``workspace_root`` through the validator.

* ``repo_root`` — the directory containing ``.quest/<id>/``. Anchors
  quest-artifact path checks and the sibling-quest detector.
* ``workspace_root`` — where the agent was editing. Anchors workspace-
  file path checks: relative-path resolution in ``handoff.artifacts``,
  traversal containment, existence on disk.

In non-worktree mode the two are the same and existing tests still
pass. In worktree mode the orchestrator passes
``--workspace-root <source_workspace_root>``.

The CLI adds ``--workspace-root`` (optional, defaults to the repo root
so non-worktree invocations work unchanged). ``run(...)`` gains the
matching kwarg. ``_check_one`` picks the traversal anchor based on path
classification (repo_root for quest artifacts, workspace_root for
workspace files). ``_load_declared_artifacts`` resolves relative paths
against ``workspace_root`` per the workflow doctrine that orchestrators
pass absolute quest-artifact paths in worktree mode.

Workflow doc updated to include ``--workspace-root <source_workspace
_root>`` in the example invocation, with a paragraph explaining when
the flag matters.

Two new regression pins:

* ``test_worktree_mode_workspace_files_resolve_against_worktree`` —
  builder handoff with absolute quest-artifact paths plus a
  worktree-relative ``scripts/foo.py``; validator returns 0 only when
  ``workspace_root`` is the worktree.
* ``test_worktree_mode_without_workspace_root_flag_breaks`` — same
  shape but ``workspace_root`` omitted; validator MUST record
  ``traversal_outside_repo`` for the worktree path. Documents why the
  flag matters and guards against future removal.

Tests: 596 unit (was 594, +2 worktree-mode regression pins)
+ 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Cubic flagged a regression introduced by b316c3c: the new
``workspace_root`` traversal anchor was selected by ``is_quest_artifact``
(true only for our-quest paths). In worktree mode that anchored both
shared ``.quest/`` infrastructure paths (``cache/``, ``backlog/``,
``audit.log``) and sibling-quest paths (``.quest/<OTHER-id>/...``) to
the worktree root — but those paths live in the original repo, not the
worktree. The traversal check then falsely emitted
``traversal_outside_repo``.

The right invariant: any path under ``<repo>/.quest/`` belongs with the
original repo regardless of mode. Only paths outside ``.quest/`` are
workspace files.

Fix: replace the ``is_quest_artifact`` traversal-anchor selector with
an ``in_dot_quest_tree`` selector (whether the resolved path is anywhere
under ``<repo>/.quest/``). Paths under ``.quest/`` anchor to
``repo_root``; everything else anchors to ``workspace_root``. The
downstream sibling/shared classification reuses the same
``relative_to(.quest)`` computation instead of recomputing it.

In non-worktree mode the behavior is unchanged because both roots are
the same path.

Two new regression pins:

* ``test_worktree_mode_shared_dot_quest_paths_pass`` — repo and worktree
  in separate temp dirs; builder declares ``.quest/cache/...``,
  ``.quest/backlog/...``, and ``.quest/audit.log`` alongside canonical
  deliverables. Validator returns 0; under the b316c3c code these three
  would have logged ``traversal_outside_repo``.
* ``test_worktree_mode_sibling_quest_records_outside_boundary`` — repo
  and worktree separate; builder declares a sibling-quest path. The
  validator MUST report ``outside_boundary``, NOT
  ``traversal_outside_repo``.

Tests: 598 unit (was 596, +2 worktree-shared/sibling regression pins)
+ 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Two follow-ups from the worktree-mode design discussion:

1) Document the ``<worktree>/.quest -> <repo>/.quest`` symlink
   invariant in two places so the next reviewer doesn't have to
   re-derive why the validator handles worktree mode the way it does:
   - Validator module docstring gets a "Worktree symlink invariant"
     paragraph explaining that the three forms an agent might write
     for the same quest artifact (absolute repo path, absolute
     worktree path, worktree-relative path) all canonicalize to the
     same on-disk inode under the repo via Path.resolve(). The
     ``--workspace-root`` flag only governs non-``.quest/`` workspace
     files.
   - AGENTS.md's "Wrong-location guardrails" section gets a one-line
     callout pointing to the same invariant.

2) Add ``test_worktree_mode_combined_smoke`` exercising all four
   classifications in a single handoff (our canonical quest artifact,
   true workspace file, shared ``.quest/cache/`` path, sibling-quest
   cross-write) with the ``.quest`` symlink in place. The previous
   pins covered each classification individually; this one pins them
   acting together so a future refactor that reshuffles classification
   order cannot silently break one path while another still passes.

   The test uses canonical repo paths for quest artifacts per the
   workflow.md contract (orchestrators pass canonical paths even in
   worktree mode). The symlink is present as a structural element but
   not used to declare paths.

Tests: 599 unit (was 598, +1 combined smoke) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Codex flagged a real interaction left over from eff55b3: the coverage
check identity used ``os.path.normpath/abspath`` (cubic's anti-symlink-
collision defense from a prior round), which does NOT follow the
``<worktree>/.quest -> <repo>/.quest`` symlink. In worktree mode an
agent's worktree-relative declaration like
``.quest/<id>/.../pr_description.md`` resolves through
``_load_declared_artifacts`` against ``workspace_root`` to
``<worktree>/.quest/.../pr_description.md`` — then the coverage
identity compares it against the canonical
``<repo>/.quest/.../pr_description.md`` and emits a false ``missing``,
even though ``_check_one`` would later resolve the symlink and accept
the path.

My previous commit (eff55b3) papered over this by leaning on the
``workflow.md`` doctrine that orchestrators pass canonical repo paths
even in worktree mode. That's brittle — a well-formed handoff with a
worktree-rooted relative entry is also legitimate, and the validator
should be robust to either form.

Fix: split the coverage identity by path location.

* **Paths under ``<repo>/.quest/``** (either directly or via the
  worktree ``.quest`` symlink): canonicalize through ``Path.resolve()``
  so the worktree mount and the repo mount produce the same identity
  for the same on-disk inode.
* **Non-``.quest/`` paths** (true workspace files): keep the
  ``os.path.normpath(os.path.abspath(...))`` identity. Cubic's
  anti-symlink-collision defense remains intact for genuine workspace
  files — two distinct workspace files that happen to symlink to one
  target stay distinct identities.

This narrows cubic's defense to where it actually applies (workspace
files), and trusts ``expected_artifacts_for_role`` to return logically
distinct paths inside ``.quest/`` so resolving through the symlink
cannot collapse them.

Updated the module docstring's "Worktree symlink invariant" section to
reflect the new behavior (validator actively canonicalizes, not just
relies on the orchestrator contract).

New regression test:
``test_worktree_mode_quest_artifact_via_symlinked_mount`` — agent
declares one quest artifact as an absolute worktree path and one as a
worktree-relative path, with the ``.quest`` symlink in place. Coverage
must NOT emit ``missing`` for either; both paths resolve to canonical
repo inodes.

Tests: 600 unit (was 599, +1 symlink-declare regression pin)
+ 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
Two Codex findings on ``9e00998``, both ``Should fix``, both real:

1) The orchestrator instruction in ``workflow.md`` surfaced records
   from ``.quest/<id>/logs/path_compliance.log`` to the user. That log
   is the append-only audit trail across every validator invocation in
   the quest. Reading it for current-run attribution surfaces stale
   records from earlier roles, retries, or iterations and
   misattributes them to the active handoff.

   Fix: the CLI now emits one JSON line per current-run mismatch to
   stdout (in addition to the log append). Stdout stays empty on a
   clean run (exit 0). The orchestrator surfaces stdout to the user
   instead of reading the log; the log remains the forensic trail.

   ``_emit_mismatch_lines`` is a new helper alongside
   ``_append_mismatch_lines``. Both run-level return sites (validator
   exception path and the normal mismatch path) emit + append in
   lockstep.

   ``workflow.md`` step 7 updated to reflect the new attribution
   contract — surface stdout, treat the log as audit-only.

2) ``.claude/hooks/branch-dir-context.sh`` collapses LF/CR in the
   working directory with ``tr``. The hook conditionally handles
   ``git`` being absent (PATH-restricted sandboxes), but the same
   sandboxes can drop ``tr`` too — both live in ``/usr/bin``. A
   ``tr: command not found`` on stderr would break the hook's
   no-stderr / single-line contract.

   Fix: use bash parameter expansion ``${var//pattern/replacement}``
   for the LF/CR substitution. It's a builtin (bash 2.0+, 1996), no
   PATH lookup, no external command. The hook now has zero external
   dependencies on the formatting path — only ``git`` remains, and
   the surrounding code already handles its absence.

Two new regression pins under ``TestCliStdoutContract``:

* ``test_clean_run_emits_no_stdout`` — exit 0 ⇒ stdout empty. Lets
  callers use stdout presence as a "had mismatches" signal.
* ``test_mismatch_run_emits_only_current_records_to_stdout`` — two
  sequential invocations on the same quest dir (builder, then
  planner). Each invocation's stdout contains ONLY that run's
  mismatches. The persistent log retains BOTH (audit trail intact).
  Pins the stale-attribution failure mode Codex flagged.

Tests: 602 unit (was 600, +2 stdout-contract pins) + 2 perf green.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
PR #116 review feedback: the rationale for using bash parameter
expansion over ``tr`` invoked hypothetical PATH-restricted sandboxes
missing coreutils — YAGNI for the platforms we support. Keep the
builtin implementation; drop the justification prose.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
in collaboration with the repository author
PR #116 review (Codex, Must fix): the per-path boundary check used
``resolved.relative_to(boundary_dir)``, which accepts any depth under
the phase directory. A handoff declaring both the real canonical path
(satisfying coverage) and an off-spec ``phase_XX/nested/<canonical>.md``
slipped through — ``_check_one`` accepted the nested entry because it
was under boundary and had a canonical basename.

Require exact parent equality (``resolved.parent == boundary_dir``).
Expected canonical paths share the same phase dir by construction, so
the tighter invariant is consistent with the artifact contract.

Regression pin: ``test_extra_canonical_named_file_in_nested_subdir_
records_outside_boundary`` declares both the canonical and the nested
duplicate and asserts the nested declaration records
``outside_boundary``.

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Codex <noreply@openai.com>
in collaboration with the repository author
…inks

PR #116 review (Codex, Must fix): a canonical quest path implemented as
a symlink to a workspace file (e.g. ``pr_description.md`` -> ``scripts/
foo.py``) slipped through. ``_check_one`` classified by the resolved
target, treated the symlink as a workspace file, and accepted it; the
expected-identity comparison in coverage matched on both sides because
both resolve to the same target.

Two related fixes in ``_check_one``/``run``:

1) Classify by the DECLARED path's directory, not the resolved target.
   ``declared_path.parent.resolve() / declared_path.name`` canonicalizes
   the worktree ``.quest`` directory symlink while leaving any
   file-level symlink visible. A symlink at a canonical quest path is
   now classified as a quest artifact and reaches the boundary check.

2) Resolve ``boundary_dir``'s parent only, not the file. ``Path(
   expected_paths[0]).resolve().parent`` would follow a file-level
   symlink on the canonical path and corrupt the boundary to wherever
   the link pointed. Use ``Path(expected_paths[0]).parent.resolve()``
   so the boundary stays anchored to the phase directory regardless of
   whether the canonical artifact was provisioned as a real file or a
   symlink.

Regression pin: ``test_quest_artifact_symlink_escaping_boundary_records_
outside_boundary`` provisions ``pr_description.md`` as a symlink to
``<workspace>/scripts/foo.py``, declares it alongside a real second
canonical, and asserts ``outside_boundary`` on the symlink declaration.

Tests: 781 unit/integration green (one pre-existing dashboard test
failure on this branch is unrelated).

Quest/Co-Authored by
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Codex <noreply@openai.com>
in collaboration with the repository author
@KjellKod KjellKod force-pushed the claude/review-intelligence-roadmap-i0my2 branch from ad22640 to aec30b1 Compare May 29, 2026 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants