Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions .claude/skills/map-release/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,12 @@ if [[ -n "$LAST_TAG" ]]; then
# maintenance commits, which otherwise make this heuristic chase its own fixes.
COMMITS_SINCE=$(git log ${LAST_TAG}..HEAD --no-merges --format="%s" | awk '!/^(docs\(changelog\)|chore\(release\):)/ { count++ } END { print count + 0 }')

# Count CHANGELOG entries in [Unreleased] section
CHANGELOG_ENTRIES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | grep -cE "^- " || echo "0")
# Count CHANGELOG entries in [Unreleased] section.
# NOTE: a range-pattern awk (/start/,/end/) collapses to the single
# matching line when start and end match the SAME line — and "##
# [Unreleased]" matches both "/## \[Unreleased\]/" and "/## \[/". Use an
# explicit flag instead so the range spans past the heading line itself.
CHANGELOG_ENTRIES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md | grep -cE "^- " || echo "0")

echo "Counted commits since $LAST_TAG: $COMMITS_SINCE"
echo "(excluding docs(changelog) and chore(release) maintenance commits)"
Expand All @@ -216,7 +220,7 @@ if [[ -n "$LAST_TAG" ]]; then
echo "════════════════════════════════════════════════════════"
echo ""
echo "Current CHANGELOG [Unreleased] content:"
awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d'
awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md
echo ""

# Ask user to update CHANGELOG
Expand Down Expand Up @@ -289,7 +293,7 @@ Read CHANGELOG.md [Unreleased] section to determine bump type:

```bash
# Extract unreleased changes
UNRELEASED_CHANGES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d')
UNRELEASED_CHANGES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md)
```

**Semantic Versioning Rules:**
Expand Down
7 changes: 7 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **Parallel-wave merge coordinator for worktree isolation (`merge_wave_worktrees`, part of #284 Phase 2).** Wires the existing wave/DAG scheduler to per-subtask worktree isolation so a parallel wave's independent subtasks each run in their own worktree and are accepted **atomically**. Every worktree of a wave is cut off the same base (HEAD at wave start), so they cannot be merged one at a time — the first `merge_subtask_worktree` advances HEAD and the next trips `BASE_DIVERGED`. The new coordinator relaxes *only* that guard to a wave-scoped form: it refuses **external** HEAD movement (`EXTERNAL_HEAD_MOVED`) but allows the sibling divergence each in-wave squash-merge creates. It derives `wave_base_sha` from the sidecar (never a caller parameter), preflights every worktree (commit + per-worktree guards + pre-merge verify) BEFORE touching the working branch, then squash-merges each accepted worktree **by frozen SHA in sorted id order** (one runner commit per subtask — the one-commit-per-subtask contract holds), then runs **one post-wave full gate on the merged tree inside the same transaction**. It is **all-or-nothing** (council-reviewed, conv `c29d6fa9`): any textual conflict, commit failure, or post-wave-gate failure rolls the whole working branch back to the wave base via `git reset --hard` + `git clean -fd` (squash leaves no `MERGE_HEAD`, so `git merge --abort` is never used; MAP runtime state is excluded from the clean) and leaves **every** worktree intact for retry — no partial-wave state ever survives. Safety extras: an advisory `flock` serializes coordinators (`MERGE_IN_PROGRESS`); attached-/clean-target preconditions; conflicted paths are attributed back to the subtasks that touched them (declared-disjoint `affected_files` is only a scheduler hint, so actual changed-file overlap is reported as advisory telemetry while git's textual conflict stays the hard guard). The shared `_wt_freeze_and_verify` primitive (commit + guards + pre-merge verify) is extracted once and reused by both the single-subtask and wave merge paths. CLI: `merge_wave_worktrees <ST…> [--branch B] [--verify-cmd CMD…] [--skip-verify] [--post-wave-cmd CMD…] [--skip-post-wave]`. Phase 3 (context-budget hooks) remains open on #284.
- **Per-subtask git worktree isolation for `/map-efficient` (`worktree.isolation`, part of #284).** Opt-in, OFF by default. When enabled, each subtask's Actor runs in a dedicated, throwaway git worktree and its result is squash-merged back into the working branch ONLY after the configured `verification_checks` pass IN the worktree (a **pre-merge** gate, strictly stronger than today's post-commit check) — a rejected attempt (Monitor `valid=false` / Evaluator fail) is discarded so the working branch is never touched by a bad attempt. The Python step runner owns the whole lifecycle and every safety guard (producer-owns-parse): `create_subtask_worktree` (crash-safe remove-and-recreate; guards: not-a-repo, protected-ref, nested-worktree refusal, active-git-op, `subtask_id` ref/path sanitization, dirty-main refusal, submodule init), `merge_subtask_worktree` (guards run BEFORE the working branch is touched: base-divergence `git merge-base` check, runtime-state-in-diff, configurable bulk-deletion threshold `worktree.max_deletions`, submodule-pointer change, detached-HEAD, then the pre-merge verify gate; accept = `git merge --squash` + one runner-authored commit, never `--no-ff`, preserving one-commit-per-subtask), `discard_subtask_worktree` (atomic reject, idempotent, optional `--save-patch` forensics), and `worktree_isolation_status` (reconciles recorded vs live worktrees). Worktrees are stored OUT of the working tree under the repo's git common dir (`<git-common-dir>/map-framework/worktrees/`), so `git clean -fdx`, recursive scanners, and accidental commits can never touch them; MAP runtime state (`.map/<branch>/...`) always resolves against the main checkout — state-mutating commands refuse if invoked from inside a managed worktree (the silent state-desync footgun). Every guard returns a structured `{kind, message}` the skill branches on. Config keys `worktree.{isolation,max_deletions}`; new `worktree` manifest stage; `.map/<branch>/worktrees.json` sidecar. Design was llm-council-reviewed (runner-owned worktrees over harness-native `isolation="worktree"`; squash-merge over `--no-ff`; always-discard on reject; pre-merge verification + crash-safe retry + atomic reject folded in so the slice is not a no-op; explicit state-root separation). Phase 2 (wave/DAG parallelism) and Phase 3 (context-budget hooks) remain open on #284.
- **Cross-AI peer review for `/map-review` (`--cross-ai <runtime>`, part of #288).** `/map-review --cross-ai codex|gemini|claude|opencode` dispatches the review to an INDEPENDENT external AI CLI for a true second opinion (different model/vendor, fresh context with no shared session). The dispatch, parsing, normalization, and untrusted-wrapping all live in the Python step runner (`run_cross_ai_review` / `dispatch_cross_ai_review`, producer-owns-parse) — the skill only handles consent and presentation. Egress is **double-consent**: the per-run `--cross-ai` flag AND `review.cross_ai.enabled: true` in `.map/config.yaml` (off by default) are both required, because the diff/code leaves the machine. Mandatory guardrails: a **high-confidence outbound secret scan** (private keys, AWS/GitHub/Google/Slack credentials) BLOCKS dispatch before the subprocess and surfaces only the pattern name, never the value; the external CLI is invoked `shell=False` with a literal-argv adapter and a configurable timeout; the returned findings ALWAYS enter context behind an `EXTERNAL UNTRUSTED REFERENCE` fence (link/injection scan, applied deterministically in Python so the model cannot skip it) and are advisory-only (`source: cross_ai`, never auto-applied); same-vendor runtimes (`claude`) are honestly labeled `independent_vendor: false`. Any dispatch failure (disabled, CLI missing, not authenticated, timeout, non-JSON output, secret-blocked) degrades non-blockingly and falls back to the in-session review. Config keys `review.cross_ai.{enabled,runtime,timeout_seconds}`. Design was llm-council-reviewed (Python-owned dispatch; single-runtime slice with `--cross-ai all` consensus deferred to a follow-up slice).
- **Adversarial multi-perspective code review (`/map-review --adversarial`).** Runs three parallel independent reviewers with isolated contexts instead of a single monitor pass: Blind Hunter (diff-only, unbiased by stated intent), Edge Case Hunter (diff + repo read; null handling, boundaries, error paths), and Acceptance Auditor (diff + spec + artifacts; missed requirements, AC gaps). Adds a `--quick` flag (Blind + Acceptance, skips Edge Case) and a `--show-raw-findings` debug flag. Findings use a structured severity/category/evidence/failure_mode schema, deduplicated via deterministic clustering with corroboration signals, and rolled up into a unified report with a convergence section and all-clear statements. New `build_adversarial_review_prompts()` / `aggregate_adversarial_findings()` in the step runner, plus an `adversarial-reference.md` workflow doc. This is the Claude-side feature the Codex port (above) mirrors.
- **`mapify tokenreport` dashboard, history, estimate, and export modes (closes #289).** `token_report_dashboard()` adds a box-drawing visual layout (session summary, per-subtask bar chart, per-agent/model breakdowns, vs-previous-session comparison); `record_session_snapshot()` persists `token_history.jsonl` for `token_report_history()` trend analysis; `token_report_estimate()` gives a weighted cost projection; `token_report_json()` / `token_report_csv()` support CI/export. New CLI flags: `--dashboard`, `--history`, `--json`, `--csv`, `--estimate`, `--finalize`.
- **Learned rules scoped by `path_glob` (closes #280).** Rules with a `paths:` frontmatter key are now filtered before Actor context and personal-rules injection, and only load when the agent is working on matching files — aligning with Claude Code's hierarchical rule-loading pattern instead of injecting every learned rule into every subtask regardless of relevance.
- **Auto-created GitHub Release in the release CI workflow (closes #279).** `release.yml` now uses `softprops/action-gh-release@v2` to auto-create the GitHub Release (with a changelog excerpt) on tag publish, with the required `contents: write` / `id-token: write` permissions. The manual Phase 5.4 (`gh release create`) step is dropped from the `/map-release` skill; the summary/checklist now reference the auto-created release URL instead.

### Fixed
- **`detect_actor_files_changed_mismatch` no longer false-positives on MAP-only subtask artifacts (closes #277).** The actor files-changed gate validated every declared file against `_current_subtask_changed_files`, which derives from `git diff`/`git status` and strips the gitignored framework trees (`.map/`, `.codex/`, `.agents/`). A subtask whose only declared `affected_files` entry was a MAP artifact (e.g. `.map/<branch>/verification-summary.md`) therefore always reported `status_mismatch=true` with a false "Actor declared files it did not write" recovery instruction, making MAP-only documentation/verification subtasks look like truncated actor edits. The detector now partitions declared files: git-tracked files keep the diff check, while MAP-internal artifacts are validated by filesystem existence + non-empty content (a missing or empty artifact is still a real mismatch). MAP-artifact validation is independent of git availability, so a MAP-only subtask is never forced into a false mismatch by a git error. A new shared `_is_map_internal_artifact` helper de-duplicates the framework-tree prefix list used by both the strip filter and the new validation path.
- **Workflow-context injection no longer fires on a terminal `COMPLETE` state (closes #317).** When `step_state.json` has `current_step_id` or `current_step_phase` equal to `"COMPLETE"`, `format_reminder()` now returns `None` immediately via a terminal-state guard, so the hook emits `{}` instead of a misleading "REQUIRED: Complete phase COMPLETE" banner after a workflow has already finished. Added a `_TERMINAL_STEP_IDS` frozenset constant and regression tests covering both the subprocess-integration and unit (`format_reminder`) paths.
- **`record_test_baseline` timeout is now fail-safe, not fail-open (closes #307).** When the baseline subprocess timed out it never finished, so `baseline_failures` was always `[]` — indistinguishable from a genuinely clean suite, silently treating any pre-existing failure as "not pre-existing" and defeating the regression-vs-pre-existing distinction. Status is now `"timed_out"` (distinct from `"baseline_failures"`); a new `baseline_complete: bool` field is `false` on timeout so downstream code can check it before trusting an empty baseline; `list_baseline_failures` propagates `baseline_complete`/`timed_out` and emits a `warning` key when the stored baseline is incomplete. Default `timeout_seconds` raised from 120 to 600 to give most suites room to finish; `--timeout` still accepts an explicit value.
- **Bare-basename spec citations now auto-resolve instead of hard-failing (closes #301, closes #300).** `validate_spec_citations.py` resolves a bare filename citation (e.g. `api.ts:80`) automatically when it is unique in the repo; an ambiguous bare basename now produces a non-blocking warning instead of a hard error, and a genuinely missing file gets a clearer error message. Separately, `/map-plan` Step 0's research-agent now writes its full report directly to disk (with the pipe-based fallback kept), documenting the `SendMessage` vs. new-`Agent()` footgun for future skill authors.

## [3.20.0] - 2026-06-26

Expand Down
12 changes: 8 additions & 4 deletions src/mapify_cli/templates/skills/map-release/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,12 @@ if [[ -n "$LAST_TAG" ]]; then
# maintenance commits, which otherwise make this heuristic chase its own fixes.
COMMITS_SINCE=$(git log ${LAST_TAG}..HEAD --no-merges --format="%s" | awk '!/^(docs\(changelog\)|chore\(release\):)/ { count++ } END { print count + 0 }')

# Count CHANGELOG entries in [Unreleased] section
CHANGELOG_ENTRIES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | grep -cE "^- " || echo "0")
# Count CHANGELOG entries in [Unreleased] section.
# NOTE: a range-pattern awk (/start/,/end/) collapses to the single
# matching line when start and end match the SAME line — and "##
# [Unreleased]" matches both "/## \[Unreleased\]/" and "/## \[/". Use an
# explicit flag instead so the range spans past the heading line itself.
CHANGELOG_ENTRIES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md | grep -cE "^- " || echo "0")
Comment on lines +201 to +206

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

sed -n '190,230p' src/mapify_cli/templates/skills/map-release/SKILL.md

Repository: azalio/map-framework

Length of output: 2029


Drop the || echo "0" fallback. grep -c already prints 0 when there are no matches, so the fallback turns that into 0\n0 and can break the later numeric comparison. Count in awk or suppress the exit status without printing another value.

🧰 Tools
🪛 SkillSpector (2.3.7)

[warning] 19: [EA2] Autonomous Decision Making: Skill enables autonomous high-impact decisions without human-in-the-loop verification. Critical operations (destructive commands, financial transactions, data deletion) should require explicit user confirmation.

Remediation: Add human-in-the-loop confirmation for destructive, irreversible, or high-impact operations. Never auto-execute commands that modify files, send data, or alter system state.

(Excessive Agency (EA2))


[error] 1114: [TM1] Tool Parameter Abuse: Tool parameters are crafted to achieve unintended or unsafe behavior. Parameter abuse can bypass intended safety checks (e.g. shell=True, --force, dangerous glob patterns).

Remediation: Validate all tool parameters against an allowlist. Reject dangerous parameter values (shell=True, --force, -rf /) and use safe defaults.

(Tool Misuse (TM1))

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mapify_cli/templates/skills/map-release/SKILL.md` around lines 201 - 206,
The CHANGELOG entry count logic in the SKILL.md template currently appends a
fallback that can emit a duplicate zero and break the numeric check. Update the
Unreleased counting flow around CHANGELOG_ENTRIES so it relies on grep -c alone
or uses awk to compute the count without printing an extra value; keep the
existing Unreleased section extraction logic and remove the fallback that writes
another 0.


echo "Counted commits since $LAST_TAG: $COMMITS_SINCE"
echo "(excluding docs(changelog) and chore(release) maintenance commits)"
Expand All @@ -216,7 +220,7 @@ if [[ -n "$LAST_TAG" ]]; then
echo "════════════════════════════════════════════════════════"
echo ""
echo "Current CHANGELOG [Unreleased] content:"
awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d'
awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md
echo ""

# Ask user to update CHANGELOG
Expand Down Expand Up @@ -289,7 +293,7 @@ Read CHANGELOG.md [Unreleased] section to determine bump type:

```bash
# Extract unreleased changes
UNRELEASED_CHANGES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d')
UNRELEASED_CHANGES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md)
```

**Semantic Versioning Rules:**
Expand Down
12 changes: 8 additions & 4 deletions src/mapify_cli/templates_src/skills/map-release/SKILL.md.jinja
Original file line number Diff line number Diff line change
Expand Up @@ -198,8 +198,12 @@ if [[ -n "$LAST_TAG" ]]; then
# maintenance commits, which otherwise make this heuristic chase its own fixes.
COMMITS_SINCE=$(git log ${LAST_TAG}..HEAD --no-merges --format="%s" | awk '!/^(docs\(changelog\)|chore\(release\):)/ { count++ } END { print count + 0 }')

# Count CHANGELOG entries in [Unreleased] section
CHANGELOG_ENTRIES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | grep -cE "^- " || echo "0")
# Count CHANGELOG entries in [Unreleased] section.
# NOTE: a range-pattern awk (/start/,/end/) collapses to the single
# matching line when start and end match the SAME line — and "##
# [Unreleased]" matches both "/## \[Unreleased\]/" and "/## \[/". Use an
# explicit flag instead so the range spans past the heading line itself.
CHANGELOG_ENTRIES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md | grep -cE "^- " || echo "0")
Comment on lines +201 to +206

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file='src/mapify_cli/templates_src/skills/map-release/SKILL.md.jinja'

# Show the relevant section with line numbers
sed -n '190,215p' "$file" | cat -n

# Inspect how CHANGELOG_ENTRIES is later used
rg -n "CHANGELOG_ENTRIES|CHANGELOG.md|grep -cE \"^- \"" "$file"

Repository: azalio/map-framework

Length of output: 2029


Drop the || echo "0" fallback. grep -c already prints 0; the fallback appends a second 0, so CHANGELOG_ENTRIES can become 0\n0 and break the numeric check. Count directly in awk, or keep grep's output and suppress only its exit status.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mapify_cli/templates_src/skills/map-release/SKILL.md.jinja` around lines
201 - 206, The CHANGELOG_ENTRIES calculation in the map-release SKILL template
is adding a redundant fallback that can turn an empty count into duplicate
output. Remove the `|| echo "0"` behavior from the Unreleased-section counting
logic, and either count matching entries directly in `awk` or keep using `grep
-cE` while only suppressing its nonzero exit status so `CHANGELOG_ENTRIES` stays
a single numeric value.


echo "Counted commits since $LAST_TAG: $COMMITS_SINCE"
echo "(excluding docs(changelog) and chore(release) maintenance commits)"
Expand All @@ -216,7 +220,7 @@ if [[ -n "$LAST_TAG" ]]; then
echo "════════════════════════════════════════════════════════"
echo ""
echo "Current CHANGELOG [Unreleased] content:"
awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d'
awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md
echo ""

# Ask user to update CHANGELOG
Expand Down Expand Up @@ -289,7 +293,7 @@ Read CHANGELOG.md [Unreleased] section to determine bump type:

```bash
# Extract unreleased changes
UNRELEASED_CHANGES=$(awk '/## \[Unreleased\]/,/## \[/' CHANGELOG.md | sed '$d')
UNRELEASED_CHANGES=$(awk '/^## \[Unreleased\]/{f=1;next} /^## \[/{f=0} f' CHANGELOG.md)
```

**Semantic Versioning Rules:**
Expand Down
Loading