Skip to content

feat(map-review-codex): port map-review skill to the Codex provider#322

Merged
azalio merged 9 commits into
mainfrom
feat/map-review-codex-port
Jul 2, 2026
Merged

feat(map-review-codex): port map-review skill to the Codex provider#322
azalio merged 9 commits into
mainfrom
feat/map-review-codex-port

Conversation

@azalio

@azalio azalio commented Jul 2, 2026

Copy link
Copy Markdown
Owner

Summary

  • Ports the Claude-only map-review skill to the Codex provider ($map-review), reaching feature parity with map-plan/map-efficient: normal mode dispatches monitor/predictor/evaluator via spawn_agent(agent_type=...), adversarial mode runs three sequential in-session passes (no new agent-dispatch primitive), and --cross-ai <runtime> reuses run_cross_ai_review verbatim (secret-scan/injection-detection trust boundary untouched).
  • Adds two new Codex agent configs (predictor.toml, evaluator.toml, condensed from the canonical Claude prompts) and registers them in config.toml; ships the skill as SKILL.md + review-reference.md + adversarial-reference.md, mirroring the map-efficient split-reference pattern.
  • The shared provider-neutral runtime (map_step_runner.py) is reused unmodified — zero logic changes, verified via git diff across the whole change range.
  • Docs: $map-review row in Codex AGENTS.md, a CHANGELOG entry, and a corrected RELEASING.md upgrade note (the Codex skill install path is .agents/skills/, not .codex/skills/).
  • Captures three new lessons from this workflow into .claude/rules/learned/ (Monitor scope-correction via same-thread re-argument, three-way spec/source/output header drift, shared-fixture staleness surfacing at a later subtask than its origin).

Test plan

  • make check-render — zero diff across all generated trees
  • pytest tests/test_mapify_cli.py::TestCodexProvider::test_ac10_no_claude_refs_anywhere -q — no Claude-only tokens leaked into Codex output
  • pytest tests/test_mapify_cli.py -k map_review -v — scoped existence test green (no longer skip-gated)
  • Full suite: pytest -q — 3236 passed, 0 failed, 3 pre-existing skips (no regressions vs pre-plan baseline)
  • git diff confirms src/mapify_cli/templates_src/map/scripts/map_step_runner.py.jinja untouched across the whole change

Summary by CodeRabbit

  • New Features

    • Added the $map-review skill to Codex, including interactive review, adversarial review, and cross-review support.
    • Added new reviewer and evaluator capabilities to the Codex setup.
  • Documentation

    • Added detailed user-facing guidance for review modes, flags, workflow steps, and troubleshooting.
    • Updated release and upgrade instructions to point to the new skill layout.
  • Tests

    • Added an acceptance test to verify the new map-review skill files are available in both templates and initialized projects.

azalio added 8 commits July 2, 2026 14:05
…nfigs

Port predictor.md.jinja and evaluator.md.jinja to 3-key Codex TOML
schema, condensed while preserving tiered-triage/risk enums and the
seven-dimension weighted rubric with minimality-gated deletion lens.
… config.toml

Add [agents.predictor] and [agents.evaluator] entries so spawn_agent
can reference them; matches the existing decomposer/monitor/researcher
entry shape.
Skeleton with name+description frontmatter, $map-review invocation,
and mode-routing hooks (normal/adversarial/cross-ai) pointing at
review-reference.md and adversarial-reference.md, which land in
ST-006 and ST-005 respectively. Zero Claude-only API tokens.
…p-review skill

Positive test only for map-review (not a generic Claude->Codex parity
gate); skipped until review-reference.md/adversarial-reference.md land
via ST-005/ST-006.
…ss-ai wiring

adversarial-reference.md.jinja ports Blind Hunter / Edge Case Hunter /
Acceptance Auditor as sequential in-session passes (no spawn_agent()),
with an explicit unverified-assumption note about ad-hoc agent
spawning. SKILL.md.jinja gains --cross-ai handling that reuses
run_cross_ai_review verbatim (no reimplemented secret-scan/injection
logic), preserves fall-through-never-hard-stop, and keeps
--ci/--auto/--detached plus the --compare-orderings/--shuffle-sections
mutual exclusion.
…dispatch

review-reference.md.jinja ports section rubrics, compare-orderings, and
the minimality-gated what-to-delete lens. SKILL.md.jinja gains the
normal-mode 4-section walkthrough (## Architecture/Code Quality/Tests/
Performance) with spawn_agent(agent_type=monitor|predictor|evaluator)
dispatch, a single PROCEED|REVISE|BLOCK Final Verdict, and verbatim
reuse of the 15 provider-neutral review CLI verbs. Also refreshes the
tests/fixtures/codex/config.toml golden fixture that went stale after
ST-001/ST-002 registered predictor/evaluator.
Add $map-review row to the Codex AGENTS.md skill table, a CHANGELOG
entry for the port, and correct the stale RELEASING.md upgrade note
that referenced a nonexistent .codex/skills/ path (Codex skills
install under .agents/skills/).
Records three new lessons from the 7-subtask /map-efficient run:
Monitor scope-correction via same-thread re-argument for forward-
reference false positives, three-way spec/source/output header text
drift resolution, and shared golden-fixture staleness surfacing at a
later subtask than its origin.
@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@azalio, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 26 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: cb6c53ca-fd08-45bb-9eb0-62c85e5eb83e

📥 Commits

Reviewing files that changed from the base of the PR and between 5e15b3a and 522db38.

📒 Files selected for processing (10)
  • .agents/skills/map-review/SKILL.md
  • .agents/skills/map-review/review-reference.md
  • .codex/agents/evaluator.toml
  • src/mapify_cli/templates/codex/agents/evaluator.toml
  • src/mapify_cli/templates/codex/skills/map-review/SKILL.md
  • src/mapify_cli/templates/codex/skills/map-review/review-reference.md
  • src/mapify_cli/templates_src/codex/agents/evaluator.toml.jinja
  • src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja
  • src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja
  • tests/test_mapify_cli.py
📝 Walkthrough

Walkthrough

This PR ports the /map-review skill to the Codex provider. It adds SKILL.md, review-reference.md, and adversarial-reference.md documentation, new predictor and evaluator agent TOML configs, wires them into config.toml/AGENTS.md across canonical, templates, and templates_src sources, plus an acceptance test and doc/rule updates.

Changes

Map-review Codex Port

Layer / File(s) Summary
SKILL.md workflow definition
.agents/skills/map-review/SKILL.md, src/mapify_cli/templates/codex/skills/map-review/SKILL.md, src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja
Defines flag parsing, precheck, bundle/prompt building, agent dispatch, truncation/verification gates, cross-AI/adversarial branching, presentation, and handoff artifacts.
Adversarial reference doc
.agents/skills/map-review/adversarial-reference.md, src/mapify_cli/templates/codex/skills/map-review/adversarial-reference.md, src/mapify_cli/templates_src/codex/skills/map-review/adversarial-reference.md.jinja
Documents the three-pass sequential adversarial workflow, JSON validation/retry, aggregation, unified report, and verdict rules.
Review reference doc
.agents/skills/map-review/review-reference.md, src/mapify_cli/templates/codex/skills/map-review/review-reference.md, src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja
Documents review modes, flag parsing, agent dispatch, verification gate, section ordering, What-To-Delete Lens, compare-orderings, and handoff artifacts.
Evaluator agent config
.codex/agents/evaluator.toml, src/mapify_cli/templates/codex/agents/evaluator.toml, src/mapify_cli/templates_src/codex/agents/evaluator.toml.jinja
Defines seven-dimension scoring rubric, auto-fail rules, decision tree, complexity lens mode, and JSON output contract.
Predictor agent config
.codex/agents/predictor.toml, src/mapify_cli/templates/codex/agents/predictor.toml, src/mapify_cli/templates_src/codex/agents/predictor.toml.jinja
Defines impact-analysis workflow, risk rubric, edge-case checklist, confidence scoring, and required JSON output.
Agent wiring and skill listing
.codex/config.toml, src/mapify_cli/templates/codex/config.toml, src/mapify_cli/templates_src/codex/config.toml.jinja, tests/fixtures/codex/config.toml, .codex/AGENTS.md, src/mapify_cli/templates/codex/AGENTS.md, src/mapify_cli/templates_src/codex/AGENTS.md.jinja
Registers predictor/evaluator agents and lists the $map-review skill.
Tests, changelog, release docs, learned rules
tests/test_mapify_cli.py, CHANGELOG.md, RELEASING.md, .claude/rules/learned/*
Adds acceptance test for skill files, documents the port, updates upgrade instructions, and appends learned rules.

Estimated code review effort: 3 (Moderate) | ~25 minutes

Poem

A rabbit hops through Codex's den,
Porting skills with a copy-paste pen 🐇
Monitor, Predictor, Evaluator too,
Adversarial hunters chase what's askew,
Three passes done, verdict in sight—
Ship it, revise it, or block it tonight!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly states the main change: porting the map-review skill to the Codex provider.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/map-review-codex-port

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
src/mapify_cli/templates/codex/agents/evaluator.toml (1)

230-231: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Scoring floor contradicts calibration text.

Line 231 states scores are "integers 1-10 (never 0, ...)" but the per-dimension rubric repeatedly defines a "0-1" bucket (e.g. Lines 46, 68, 87, 96, 107) implying 0 is a valid conceptual score. This ambiguity could cause the evaluator to either violate the "never 0" rule when a dimension is genuinely non-functional, or clamp a legitimately terrible score up to 1 inconsistently with the rubric text.

✏️ Suggested clarification

Apply the same floor clarification to the other "0-1" bucket descriptions (Lines 68, 87, 96, 107).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mapify_cli/templates/codex/agents/evaluator.toml` around lines 230 - 231,
The scoring rules in evaluator.toml are inconsistent: the validation text says
dimension scores are integers 1-10 and never 0, while the rubric buckets still
describe a 0-1 range. Update the validation/rubric language so the floor is
unambiguous, and make the bucket descriptions in the affected scoring sections
(including the ones with the 0-1 bucket) match the intended minimum score. Keep
the terminology consistent across the relevant rubric entries and the overall
validation rules.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.agents/skills/map-review/adversarial-reference.md:
- Around line 105-108: The validation step for truncated adversarial output is
using the wrong monitor identifier, so it may not exercise the documented
reviewer monitor path. Update the `map_step_runner.py
detect_truncated_agent_output` invocation to use `monitor` instead of
`review-monitor`, matching the expected monitor kind used elsewhere in the
`map-review` flow.

In @.agents/skills/map-review/SKILL.md:
- Around line 181-184: The truncation check is invoking
detect_truncated_agent_output with the wrong agent identifier, so update the
SKILL.md command to use the documented monitor path via the existing
map_step_runner call. Keep the pipeline structure unchanged, but replace the
review-monitor argument with the correct monitor value so the gate uses the
intended parsing behavior.

In @.codex/agents/evaluator.toml:
- Around line 40-114: The scoring rubric is inconsistent because each dimension
includes a 0-1 bucket while the output rule says scores must be integers 1-10.
Update the rubric in evaluator.toml so the lowest bucket starts at 1-2 or
otherwise aligns with the 1-10 constraint, and make sure the summary text for
each dimension (like Functionality, Completeness, Security, etc.) no longer
implies 0 is valid.

In
`@src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja`:
- Around line 20-27: The review-mode JSON is being built by interpolating
SIBLING_HINT and REVIEW_MODE directly into an echo string, so unescaped
commit-message text can break parsing. Update the review-mode.json generation
step in the review-reference template to serialize the payload safely the same
way the other workflow steps do, instead of hand-assembling JSON. Keep the
existing SIBLING_HINT extraction from git log, but ensure the final write to
.map/$BRANCH/review-mode.json escapes quotes, backslashes, and newlines
correctly.
- Around line 92-121: The dispatch guidance in the review reference is
inconsistent with the documented full vs lightweight mode split, and the bare
code fence also triggers markdownlint. Update the section around the
`spawn_agent(...)` examples so it clearly marks `monitor` as always run and
`predictor`/`evaluator` as full-mode only, matching the `Dispatch` prose and the
`COMPLEXITY_LENS_ENABLED` conditional. Also add a language tag to the fenced
block so the markdown stays lint-clean while preserving the referenced agent
symbols and output-contract wording.

In `@src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja`:
- Around line 55-68: Clarify the execution rule around launchReviewerAgents so
it does not prohibit the required truncated-output retry in Step A.2b. Update
the wording in the map-review SKILL template to state that reviewer agents are
launched once per review run initially, but a single retry is still allowed when
truncated output is detected. Keep the order and intent of the phase list intact
while making the retry exception explicit.
- Around line 141-168: The dispatch example in SKILL.md.jinja contradicts the
lightweight/full mode description, so update the example to show mode-dependent
branching: monitor only for lightweight mode, and monitor plus predictor plus
evaluator for full mode, using the same spawn_agent(agent_type=...) symbols
already referenced. Keep the complexity_lens dispatch under its existing
COMPLEXITY_LENS_ENABLED guard, and add a language identifier to the fenced code
block to match the markdownlint expectation.

In `@tests/test_mapify_cli.py`:
- Around line 1728-1778: The AC number for this map-review skill existence test
is reused and conflicts with the existing AC-9 section, making failures
ambiguous. Update the criterion identifier in
test_ac09_codex_map_review_skill_exists to a distinct AC number that is not
already used elsewhere in tests/test_mapify_cli.py, and keep the assertion logic
for the templates_dir and agents_skills_dir checks unchanged.

---

Nitpick comments:
In `@src/mapify_cli/templates/codex/agents/evaluator.toml`:
- Around line 230-231: The scoring rules in evaluator.toml are inconsistent: the
validation text says dimension scores are integers 1-10 and never 0, while the
rubric buckets still describe a 0-1 range. Update the validation/rubric language
so the floor is unambiguous, and make the bucket descriptions in the affected
scoring sections (including the ones with the 0-1 bucket) match the intended
minimum score. Keep the terminology consistent across the relevant rubric
entries and the overall validation rules.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8eb5d994-901a-4952-8b1c-d79e795542be

📥 Commits

Reviewing files that changed from the base of the PR and between 74148ec and 5e15b3a.

📒 Files selected for processing (27)
  • .agents/skills/map-review/SKILL.md
  • .agents/skills/map-review/adversarial-reference.md
  • .agents/skills/map-review/review-reference.md
  • .claude/rules/learned/architecture-patterns.md
  • .claude/rules/learned/testing-strategies.md
  • .codex/AGENTS.md
  • .codex/agents/evaluator.toml
  • .codex/agents/predictor.toml
  • .codex/config.toml
  • CHANGELOG.md
  • RELEASING.md
  • src/mapify_cli/templates/codex/AGENTS.md
  • src/mapify_cli/templates/codex/agents/evaluator.toml
  • src/mapify_cli/templates/codex/agents/predictor.toml
  • src/mapify_cli/templates/codex/config.toml
  • src/mapify_cli/templates/codex/skills/map-review/SKILL.md
  • src/mapify_cli/templates/codex/skills/map-review/adversarial-reference.md
  • src/mapify_cli/templates/codex/skills/map-review/review-reference.md
  • src/mapify_cli/templates_src/codex/AGENTS.md.jinja
  • src/mapify_cli/templates_src/codex/agents/evaluator.toml.jinja
  • src/mapify_cli/templates_src/codex/agents/predictor.toml.jinja
  • src/mapify_cli/templates_src/codex/config.toml.jinja
  • src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja
  • src/mapify_cli/templates_src/codex/skills/map-review/adversarial-reference.md.jinja
  • src/mapify_cli/templates_src/codex/skills/map-review/review-reference.md.jinja
  • tests/fixtures/codex/config.toml
  • tests/test_mapify_cli.py

Comment thread .agents/skills/map-review/adversarial-reference.md
Comment thread .agents/skills/map-review/SKILL.md
Comment thread .codex/agents/evaluator.toml
Comment thread src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja
Comment thread src/mapify_cli/templates_src/codex/skills/map-review/SKILL.md.jinja
Comment thread tests/test_mapify_cli.py
- Serialize review-mode.json via python3 -c (env-var passed, no shell
  string interpolation) instead of hand-built echo JSON, so a commit
  message containing a quote/backslash in SIBLING_HINT can no longer
  corrupt the file. Avoid nested double-quotes inside an f-string
  expression (Python 3.11 rejects that syntax; PEP 701 relaxed it only
  in 3.12+) by binding the branch to a local variable first.
- Make the full/lightweight mode split explicit on the spawn_agent
  dispatch examples in SKILL.md.jinja and review-reference.md.jinja
  (predictor/evaluator calls now carry a "full mode only" comment), add
  a language tag to the previously-bare fence, and restore the dropped
  full/lightweight qualifier + truncation-retry exception to Execution
  Rule 6 that ST-003's condensation had silently narrowed to "exactly
  once".
- Note evaluator.toml.jinja's "never 0" scoring rule against its "0-1"
  rubric bucket floor (inherited from the Claude source's identical
  wording; scoped a cheap clarifying note to this file only).
- Disambiguate the AC-9 label on the new map-review existence test from
  the pre-existing map-plan AC-9 test in the same file.

`--agent review-monitor` (also flagged by the bot) is intentional, not a
bug: map_step_runner.py.jinja defines it as a distinct, richer schema
from `monitor`, and the Claude source (SKILL.md:342) uses the same
value for the same purpose — left unchanged.
@azalio azalio merged commit 6442317 into main Jul 2, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant