azalio · azalio · Jul 2, 2026 · Jul 2, 2026 · coderabbitai · Jul 2, 2026
diff --git a/.claude/skills/map-release/SKILL.md b/.claude/skills/map-release/SKILL.md
@@ -198,8 +198,12 @@ if [[ -n "$LAST_TAG" ]]; then
   # maintenance commits, which otherwise make this heuristic chase its own fixes.
   COMMITS_SINCE=$(git log ${LAST_TAG}..HEAD --no-merges --format="%s" | awk '!/^(docs\(changelog\)|chore\(release\):)/ { count++ } END { print count + 0 }')
 
-  # Count CHANGELOG entries in [Unreleased] section
-  CHANGELOG_ENTRIES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | grep -cE "^- " || echo "0")
+  # Count CHANGELOG entries in [Unreleased] section.
+  # NOTE: a range-pattern awk (/start/,/end/) collapses to the single
+  # matching line when start and end match the SAME line — and "##
+  # [Unreleased]" matches both "/## \[Unreleased\]/" and "/## \[/". Use an
+  # explicit flag instead so the range spans past the heading line itself.
+  CHANGELOG_ENTRIES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md | grep -cE "^- " || echo "0")
 
   echo "Counted commits since $LAST_TAG: $COMMITS_SINCE"
   echo "(excluding docs(changelog) and chore(release) maintenance commits)"
@@ -216,7 +220,7 @@ if [[ -n "$LAST_TAG" ]]; then
     echo "════════════════════════════════════════════════════════"
     echo ""
     echo "Current CHANGELOG [Unreleased] content:"
-    awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d'
+    awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md
     echo ""
 
     # Ask user to update CHANGELOG
@@ -289,7 +293,7 @@ Read CHANGELOG.md [Unreleased] section to determine bump type:
 
 ```bash
 # Extract unreleased changes
-UNRELEASED_CHANGES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d')
+UNRELEASED_CHANGES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md)
 ```
 
 **Semantic Versioning Rules:**

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -18,9 +18,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **Parallel-wave merge coordinator for worktree isolation (`merge_wave_worktrees`, part of #284 Phase 2).** Wires the existing wave/DAG scheduler to per-subtask worktree isolation so a parallel wave's independent subtasks each run in their own worktree and are accepted **atomically**. Every worktree of a wave is cut off the same base (HEAD at wave start), so they cannot be merged one at a time — the first `merge_subtask_worktree` advances HEAD and the next trips `BASE_DIVERGED`. The new coordinator relaxes *only* that guard to a wave-scoped form: it refuses **external** HEAD movement (`EXTERNAL_HEAD_MOVED`) but allows the sibling divergence each in-wave squash-merge creates. It derives `wave_base_sha` from the sidecar (never a caller parameter), preflights every worktree (commit + per-worktree guards + pre-merge verify) BEFORE touching the working branch, then squash-merges each accepted worktree **by frozen SHA in sorted id order** (one runner commit per subtask — the one-commit-per-subtask contract holds), then runs **one post-wave full gate on the merged tree inside the same transaction**. It is **all-or-nothing** (council-reviewed, conv `c29d6fa9`): any textual conflict, commit failure, or post-wave-gate failure rolls the whole working branch back to the wave base via `git reset --hard` + `git clean -fd` (squash leaves no `MERGE_HEAD`, so `git merge --abort` is never used; MAP runtime state is excluded from the clean) and leaves **every** worktree intact for retry — no partial-wave state ever survives. Safety extras: an advisory `flock` serializes coordinators (`MERGE_IN_PROGRESS`); attached-/clean-target preconditions; conflicted paths are attributed back to the subtasks that touched them (declared-disjoint `affected_files` is only a scheduler hint, so actual changed-file overlap is reported as advisory telemetry while git's textual conflict stays the hard guard). The shared `_wt_freeze_and_verify` primitive (commit + guards + pre-merge verify) is extracted once and reused by both the single-subtask and wave merge paths. CLI: `merge_wave_worktrees <ST…> [--branch B] [--verify-cmd CMD…] [--skip-verify] [--post-wave-cmd CMD…] [--skip-post-wave]`. Phase 3 (context-budget hooks) remains open on #284.
 - **Per-subtask git worktree isolation for `/map-efficient` (`worktree.isolation`, part of #284).** Opt-in, OFF by default. When enabled, each subtask's Actor runs in a dedicated, throwaway git worktree and its result is squash-merged back into the working branch ONLY after the configured `verification_checks` pass IN the worktree (a **pre-merge** gate, strictly stronger than today's post-commit check) — a rejected attempt (Monitor `valid=false` / Evaluator fail) is discarded so the working branch is never touched by a bad attempt. The Python step runner owns the whole lifecycle and every safety guard (producer-owns-parse): `create_subtask_worktree` (crash-safe remove-and-recreate; guards: not-a-repo, protected-ref, nested-worktree refusal, active-git-op, `subtask_id` ref/path sanitization, dirty-main refusal, submodule init), `merge_subtask_worktree` (guards run BEFORE the working branch is touched: base-divergence `git merge-base` check, runtime-state-in-diff, configurable bulk-deletion threshold `worktree.max_deletions`, submodule-pointer change, detached-HEAD, then the pre-merge verify gate; accept = `git merge --squash` + one runner-authored commit, never `--no-ff`, preserving one-commit-per-subtask), `discard_subtask_worktree` (atomic reject, idempotent, optional `--save-patch` forensics), and `worktree_isolation_status` (reconciles recorded vs live worktrees). Worktrees are stored OUT of the working tree under the repo's git common dir (`<git-common-dir>/map-framework/worktrees/`), so `git clean -fdx`, recursive scanners, and accidental commits can never touch them; MAP runtime state (`.map/<branch>/...`) always resolves against the main checkout — state-mutating commands refuse if invoked from inside a managed worktree (the silent state-desync footgun). Every guard returns a structured `{kind, message}` the skill branches on. Config keys `worktree.{isolation,max_deletions}`; new `worktree` manifest stage; `.map/<branch>/worktrees.json` sidecar. Design was llm-council-reviewed (runner-owned worktrees over harness-native `isolation="worktree"`; squash-merge over `--no-ff`; always-discard on reject; pre-merge verification + crash-safe retry + atomic reject folded in so the slice is not a no-op; explicit state-root separation). Phase 2 (wave/DAG parallelism) and Phase 3 (context-budget hooks) remain open on #284.
 - **Cross-AI peer review for `/map-review` (`--cross-ai <runtime>`, part of #288).** `/map-review --cross-ai codex|gemini|claude|opencode` dispatches the review to an INDEPENDENT external AI CLI for a true second opinion (different model/vendor, fresh context with no shared session). The dispatch, parsing, normalization, and untrusted-wrapping all live in the Python step runner (`run_cross_ai_review` / `dispatch_cross_ai_review`, producer-owns-parse) — the skill only handles consent and presentation. Egress is **double-consent**: the per-run `--cross-ai` flag AND `review.cross_ai.enabled: true` in `.map/config.yaml` (off by default) are both required, because the diff/code leaves the machine. Mandatory guardrails: a **high-confidence outbound secret scan** (private keys, AWS/GitHub/Google/Slack credentials) BLOCKS dispatch before the subprocess and surfaces only the pattern name, never the value; the external CLI is invoked `shell=False` with a literal-argv adapter and a configurable timeout; the returned findings ALWAYS enter context behind an `EXTERNAL UNTRUSTED REFERENCE` fence (link/injection scan, applied deterministically in Python so the model cannot skip it) and are advisory-only (`source: cross_ai`, never auto-applied); same-vendor runtimes (`claude`) are honestly labeled `independent_vendor: false`. Any dispatch failure (disabled, CLI missing, not authenticated, timeout, non-JSON output, secret-blocked) degrades non-blockingly and falls back to the in-session review. Config keys `review.cross_ai.{enabled,runtime,timeout_seconds}`. Design was llm-council-reviewed (Python-owned dispatch; single-runtime slice with `--cross-ai all` consensus deferred to a follow-up slice).
+- **Adversarial multi-perspective code review (`/map-review --adversarial`).** Runs three parallel independent reviewers with isolated contexts instead of a single monitor pass: Blind Hunter (diff-only, unbiased by stated intent), Edge Case Hunter (diff + repo read; null handling, boundaries, error paths), and Acceptance Auditor (diff + spec + artifacts; missed requirements, AC gaps). Adds a `--quick` flag (Blind + Acceptance, skips Edge Case) and a `--show-raw-findings` debug flag. Findings use a structured severity/category/evidence/failure_mode schema, deduplicated via deterministic clustering with corroboration signals, and rolled up into a unified report with a convergence section and all-clear statements. New `build_adversarial_review_prompts()` / `aggregate_adversarial_findings()` in the step runner, plus an `adversarial-reference.md` workflow doc. This is the Claude-side feature the Codex port (above) mirrors.
+- **`mapify tokenreport` dashboard, history, estimate, and export modes (closes #289).** `token_report_dashboard()` adds a box-drawing visual layout (session summary, per-subtask bar chart, per-agent/model breakdowns, vs-previous-session comparison); `record_session_snapshot()` persists `token_history.jsonl` for `token_report_history()` trend analysis; `token_report_estimate()` gives a weighted cost projection; `token_report_json()` / `token_report_csv()` support CI/export. New CLI flags: `--dashboard`, `--history`, `--json`, `--csv`, `--estimate`, `--finalize`.
+- **Learned rules scoped by `path_glob` (closes #280).** Rules with a `paths:` frontmatter key are now filtered before Actor context and personal-rules injection, and only load when the agent is working on matching files — aligning with Claude Code's hierarchical rule-loading pattern instead of injecting every learned rule into every subtask regardless of relevance.
+- **Auto-created GitHub Release in the release CI workflow (closes #279).** `release.yml` now uses `softprops/action-gh-release@v2` to auto-create the GitHub Release (with a changelog excerpt) on tag publish, with the required `contents: write` / `id-token: write` permissions. The manual Phase 5.4 (`gh release create`) step is dropped from the `/map-release` skill; the summary/checklist now reference the auto-created release URL instead.
 
 ### Fixed
 - **`detect_actor_files_changed_mismatch` no longer false-positives on MAP-only subtask artifacts (closes #277).** The actor files-changed gate validated every declared file against `_current_subtask_changed_files`, which derives from `git diff`/`git status` and strips the gitignored framework trees (`.map/`, `.codex/`, `.agents/`). A subtask whose only declared `affected_files` entry was a MAP artifact (e.g. `.map/<branch>/verification-summary.md`) therefore always reported `status_mismatch=true` with a false "Actor declared files it did not write" recovery instruction, making MAP-only documentation/verification subtasks look like truncated actor edits. The detector now partitions declared files: git-tracked files keep the diff check, while MAP-internal artifacts are validated by filesystem existence + non-empty content (a missing or empty artifact is still a real mismatch). MAP-artifact validation is independent of git availability, so a MAP-only subtask is never forced into a false mismatch by a git error. A new shared `_is_map_internal_artifact` helper de-duplicates the framework-tree prefix list used by both the strip filter and the new validation path.
+- **Workflow-context injection no longer fires on a terminal `COMPLETE` state (closes #317).** When `step_state.json` has `current_step_id` or `current_step_phase` equal to `"COMPLETE"`, `format_reminder()` now returns `None` immediately via a terminal-state guard, so the hook emits `{}` instead of a misleading "REQUIRED: Complete phase COMPLETE" banner after a workflow has already finished. Added a `_TERMINAL_STEP_IDS` frozenset constant and regression tests covering both the subprocess-integration and unit (`format_reminder`) paths.
+- **`record_test_baseline` timeout is now fail-safe, not fail-open (closes #307).** When the baseline subprocess timed out it never finished, so `baseline_failures` was always `[]` — indistinguishable from a genuinely clean suite, silently treating any pre-existing failure as "not pre-existing" and defeating the regression-vs-pre-existing distinction. Status is now `"timed_out"` (distinct from `"baseline_failures"`); a new `baseline_complete: bool` field is `false` on timeout so downstream code can check it before trusting an empty baseline; `list_baseline_failures` propagates `baseline_complete`/`timed_out` and emits a `warning` key when the stored baseline is incomplete. Default `timeout_seconds` raised from 120 to 600 to give most suites room to finish; `--timeout` still accepts an explicit value.
+- **Bare-basename spec citations now auto-resolve instead of hard-failing (closes #301, closes #300).** `validate_spec_citations.py` resolves a bare filename citation (e.g. `api.ts:80`) automatically when it is unique in the repo; an ambiguous bare basename now produces a non-blocking warning instead of a hard error, and a genuinely missing file gets a clearer error message. Separately, `/map-plan` Step 0's research-agent now writes its full report directly to disk (with the pipe-based fallback kept), documenting the `SendMessage` vs. new-`Agent()` footgun for future skill authors.
 
 ## [3.20.0] - 2026-06-26
 

diff --git a/src/mapify_cli/templates/skills/map-release/SKILL.md b/src/mapify_cli/templates/skills/map-release/SKILL.md
@@ -198,8 +198,12 @@ if [[ -n "$LAST_TAG" ]]; then
   # maintenance commits, which otherwise make this heuristic chase its own fixes.
   COMMITS_SINCE=$(git log ${LAST_TAG}..HEAD --no-merges --format="%s" | awk '!/^(docs\(changelog\)|chore\(release\):)/ { count++ } END { print count + 0 }')
 
-  # Count CHANGELOG entries in [Unreleased] section
-  CHANGELOG_ENTRIES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | grep -cE "^- " || echo "0")
+  # Count CHANGELOG entries in [Unreleased] section.
+  # NOTE: a range-pattern awk (/start/,/end/) collapses to the single
+  # matching line when start and end match the SAME line — and "##
+  # [Unreleased]" matches both "/## \[Unreleased\]/" and "/## \[/". Use an
+  # explicit flag instead so the range spans past the heading line itself.
+  CHANGELOG_ENTRIES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md | grep -cE "^- " || echo "0")
 
   echo "Counted commits since $LAST_TAG: $COMMITS_SINCE"
   echo "(excluding docs(changelog) and chore(release) maintenance commits)"
@@ -216,7 +220,7 @@ if [[ -n "$LAST_TAG" ]]; then
     echo "════════════════════════════════════════════════════════"
     echo ""
     echo "Current CHANGELOG [Unreleased] content:"
-    awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d'
+    awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md
     echo ""
 
     # Ask user to update CHANGELOG
@@ -289,7 +293,7 @@ Read CHANGELOG.md [Unreleased] section to determine bump type:
 
 ```bash
 # Extract unreleased changes
-UNRELEASED_CHANGES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d')
+UNRELEASED_CHANGES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md)
 ```
 
 **Semantic Versioning Rules:**

diff --git a/src/mapify_cli/templates_src/skills/map-release/SKILL.md.jinja b/src/mapify_cli/templates_src/skills/map-release/SKILL.md.jinja
@@ -198,8 +198,12 @@ if [[ -n "$LAST_TAG" ]]; then
   # maintenance commits, which otherwise make this heuristic chase its own fixes.
   COMMITS_SINCE=$(git log ${LAST_TAG}..HEAD --no-merges --format="%s" | awk '!/^(docs\(changelog\)|chore\(release\):)/ { count++ } END { print count + 0 }')
 
-  # Count CHANGELOG entries in [Unreleased] section
-  CHANGELOG_ENTRIES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | grep -cE "^- " || echo "0")
+  # Count CHANGELOG entries in [Unreleased] section.
+  # NOTE: a range-pattern awk (/start/,/end/) collapses to the single
+  # matching line when start and end match the SAME line — and "##
+  # [Unreleased]" matches both "/## \[Unreleased\]/" and "/## \[/". Use an
+  # explicit flag instead so the range spans past the heading line itself.
+  CHANGELOG_ENTRIES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md | grep -cE "^- " || echo "0")
 
   echo "Counted commits since $LAST_TAG: $COMMITS_SINCE"
   echo "(excluding docs(changelog) and chore(release) maintenance commits)"
@@ -216,7 +220,7 @@ if [[ -n "$LAST_TAG" ]]; then
     echo "════════════════════════════════════════════════════════"
     echo ""
     echo "Current CHANGELOG [Unreleased] content:"
-    awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d'
+    awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md
     echo ""
 
     # Ask user to update CHANGELOG
@@ -289,7 +293,7 @@ Read CHANGELOG.md [Unreleased] section to determine bump type:
 
 ```bash
 # Extract unreleased changes
-UNRELEASED_CHANGES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d')
+UNRELEASED_CHANGES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md)
 ```
 
 **Semantic Versioning Rules:**