Releases: azalio/map-framework
Releases · azalio/map-framework
Release list
MAP Framework v3.21.0
See CHANGELOG.md for details.
MAP Framework v3.20.0
[3.20.0] - 2026-06-26
Added
- Automatic cleanup of MAP-internal workflow IDs from shipped code. At workflow completion (
WORKFLOW_COMPLETE), the newscrub-internal-ids.pyStop hook strips leaked internal identifiers — subtaskST-001, acceptance criteriaAC-3, verification criteriaVC1, invariantsINV-7, hard constraintsHC-1— that an Actor wrote into the code a run changed — as comments (// The rule (INV-7) is:) or test names (test_vc1_*→test_*). The deterministic engine (.map/scripts/scrub_internal_ids.py) is hard-scoped to the run's git diff (only files the run changed, only the lines it added; pre-existing IDs on untouched lines are never modified) and to recognized source files using each language's comment syntax (#,//+/* */,<!-- -->, …). It strips ID tokens inside comments (deleting pure-marker comment lines) and renamesvc<n>test identifiers with a collision guard. IDs in code, string literals, docstrings, and data files (.json, …) are deliberately left intact and only reported — stripping a string substring would corrupt legitimate values (e.g."INV-7-special-sku") and#is a heading, not a comment, in markdown. It re-scans for residual. It then commits the cleanup as a dedicatedchore(map): strip internal workflow IDscommit, runs exactly once per completed run, no-ops outside a completed run, honorsMAP_INVOKED_BY, and can be disabled withscrub_internal_ids: falsein.map/config.yaml. The Actor prompt now also forbids writing these IDs into comments/strings (the transienttest_vc<n>grep aid stays during the run and is renamed at close). Claude provider only — the Codex hook model has noStopevent; the shared engine ships to.map/scripts/regardless.
MAP Framework v3.19.0
Fixed
- Blueprint affected-files refresh no longer shrinks approved subtask scope after resume (closes #273).
refresh_blueprint_affected_filesnow merges the computed actual delta into existingaffected_filesby default, preserving files that were already approved but excluded from the per-subtask baseline. The old destructive rewrite behavior remains available only via explicit--replace, and reports now expose bothactualandmode. - Plan resume requires an overlap floor, not containment alone (closes #274).
check_plan_resume()no longer returns a falseresumeverdict (with the dangerous "existing step_state ⇒ plan complete, print checkpoint and STOP" recommendation) when the new goal merely overlaps a contained-but-near-zero existing plan. The verdict now requires a minimum goal overlap in addition to containment, so a genuinely different goal starts fresh instead of silently resuming a stale plan. - Codex
hooks.jsonno longer carries an unsupported_map_managedtop-level key (closes #270). The codex hooks-JSON generator merged MAP metadata as a top-level_map_managedobject, which the Codex runtime rejects. The generator now writes only the supportedhooksstructure and keeps MAP's managed-merge bookkeeping out of the emitted file. /map-efficientresume prefersblueprint.jsonfor ordered subtask IDs (closes #264).resume_from_plan,resume_single_subtask,resume_from_test_contract, andget_plan_progressnow read orderedST-XXXIDs fromblueprint.jsonfirst, falling back to markdowntask_planparsing (including the/map-plantable layout with IDs in the first column).set_subtasksnormalizes whitespace-joined arguments and rejects malformed subtask IDs, so resume no longer fails to parse a well-formed plan and force a manualset_subtasks.
Added
- Opt-in
--autonomyposture formapify init(claude provider).mapify init --autonomywrites a "YOLO-minus-git" permission set — broad auto-approve (Bash(*),Read/Edit/Write/MultiEdit/Glob/Grep/LS(*)) plus aBash(git commit:*)/Bash(git push:*)deny — into the per-user, gitignored.claude/settings.local.json, leaving the committed team.claude/settings.jsonas the secure curated baseline. Because the permission-level git deny is bypassable under a broadBash(*)allow (bash -c 'git commit'matches asbash, notgit commit), enforcement is thesafety-guardrails.pyPreToolUse hook: it now hard-blocksgit commit/git push(including shell-wrapped and chained forms) gated on amapify.autonomysentinel the installer writes beside the permissions, so posture and permissions can't drift apart and the standard commit workflow is never broken for non-autonomy users.--no-autonomycleanly removes the block; omitting the flag leaves any existing local posture untouched on re-init.mapify init --autonomyalso gitignores.claude/settings.local.jsonso the personal posture can't leak to the team. The codex provider ignores the flag (it installs neither file). Design was llm-council-reviewed (per-user opt-in over team/global default; hook enforcement over permission-deny-alone; sentinel embedded insettings.local.json). - Merge-conflict resolution guardrail (closes #256). The workflow-context injector now surfaces MAP conflict-resolution guidance during a git merge/rebase preflight and whenever the index holds active unmerged paths. It detects conflicted paths read-only via
git diff --name-only --diff-filter=U -z --(the existingstep_state.jsongate is preserved) and documents the per-file, intent-preserving, test-after-each-batch protocol so conflicts are resolved one file at a time rather than with bulk overwrites. defer_flaky_subtaskorchestrator command for validated flaky Monitor outcomes (closes #252). When a Monitor verdict is an explicitdeferred_nondeterministicoutcome, the orchestrator can now persist the non-green flaky evidence metadata instep_state.jsonand advance without requeueing Actor, preserving run-health completion parity instead of grinding the retry loop on a known-flaky check.run_flaky_test_triagerepeat runner for/map-efficient(part of #252). New step-runner subcommand repeats an exactargvcommand withshell=Falseand records flaky-test evidence automatically intoflaky_test_triage.json, preserving bounded stdout/stderr tails, timeout, and duration evidence. No shell interpretation by default (shell behavior requires explicit argv such asbash -lc); output tails are tempfile-backed to avoid unbounded in-memory capture. Design was llm-council-reviewed (argv-based runner; core Monitor/orchestrator state-machine integration deferred).- Qualitative convergence sidecar for high-risk Monitor/self-review gates (issue #257). New step-runner subcommands
record_qualitative_convergence <gate-id> <pass-json> [--scope monitor|self_review] [--required-clean-passes N] [--max-passes N]andvalidate_qualitative_convergence [path]persist append-only qualitative review passes in.map/<branch>/qualitative_convergence.jsonand register thequalitative_convergencemanifest stage. Validation re-derives the tail clean streak from the pass log (clean, dirty, cleanwith K=2 is not converged), rejectsclean=truewith critical findings, requires evidence even for clean passes, and treatsmax_passes_exceededas a hard stop/escalation rather than a pass. Scope is deliberately limited to qualitativemonitor/self_review; deterministic build/test/lint gates remain single-pass. Part of #251. - Flaky-test triage artifact for
/map-efficient(issue #252). New step-runner subcommandsrecord_flaky_test_triage <check-id> <outcomes-json> [--command ...] [--reason ...] [--branch ...]andvalidate_flaky_test_triage [path]persist repeated check outcomes in.map/<branch>/flaky_test_triage.jsonand register theflaky_test_triagemanifest stage. Mixed pass/fail repetitions classify asdisposition:"deferred_nondeterministic"withmonitor_verdict_policy:"not_valid_without_explicit_triage"and operator requirements that forbid weakening, skipping, deleting, or treating the artifact as a passing gate. All-failing repetitions classify asdeterministic_failure; all-passing repetitions classify asnot_reproduced. Package schemas and run-health artifact inventory now understand the new artifact, while keeping the core Monitor/orchestrator binary verdict path unchanged. Part of #251. - Intra-run failure memory for the Actor→Monitor retry loop (issue #253). When Monitor rejects the same subtask the same way twice,
/map-efficientnow injects a binding anti-stagnation constraint into the next Actor attempt so the loop stops re-walking a dead end (token burn / identical rejected diffs). Four deterministic step-runner subcommands implement it:record_failure_signature "<feedback>" <subtask_id> [--source monitor_rejection|test_failure|gate_failure]conservatively normalizes the failure (strips line numbers, absolute-path prefixes, hex/uuid/addresses, timestamps, ANSI; preserves exception types, file basenames, symbol/test names, assertion text), hashes it, and arms on the 2nd identical signature;build_anti_repeat_constraint <subtask_id> [--quarantine-active]renders the<intra_run_failure_memory>-delimited block (shows the human-readable sample, never the hash) and returns empty when nothing is armed or a CLEAN_RETRY quarantine is active that iteration;set_anti_repeat_subtask_status <subtask_id> succeeded|failed|escalatedrecords the terminal disposition;collect_anti_repeat_learn_candidatesfeedswrite_learning_handoffso armed signs from non-succeeded subtasks become/map-learncandidates (a subtask that eventually passed is excluded — it found a way through). The constraint is anti-stagnation, not anti-approach: it binds the next delta to resolve the repeated failure, never bans a whole approach. Generic rejections with no concrete anchor ("tests still fail") are recordedlow_specificityand never arm. At the 3rd identical failure the record setsescalation_recommended=trueas a SIGNAL only — bounded-effort escalation (#255) owns the stop decision; this slice never skips the Actor call. Durable store:.map/<branch>/anti_repeat.json+anti_repeatmanifest stage; thresholds are env-tunable (MAP_ANTI_REPEAT_ARM_THRESHOLD,MAP_ANTI_REPEAT_ESCALATE_THRESHOLD). Complements — never duplicates —log_agent_failure(FORMAT failures only) andretry_quarantine(one-shot CLEAN_RETRY). Design was llm-council-reviewed (hard anti-stagnation + generic-failure guard + per-subtask scoping + CLEAN_RETRY suppression). Part of #251. - Bounded-effort escalation: "act once, then escalate" (issue #255). Turns the #253
escalation_recommendedSIGNAL (previously written but never consumed) and the orchestrator'smax_retrieshard cap into ONE deterministic terminal outcome instead of grinding the Actor→Monitor loop to the ceiling on a dead end. New step-runner subcommandbuild_escalation_outcome <subtask_id> <reason> [--retry-count N --max-retries M] [--quarantine-active](reason ∈repeated_failure | max_retries) emits a structured{status:"escalated", outcome, reason_code, attempts, blocker_summary, repeated_failures, recommended_action}, sets the subtask's anti-repeat status toescalated, writes a durable human-readable.map/<branch>/escalation_<subtask>.mdblocker report, and registers a newescalationmanifest stage. The outcome splits on cause: a 3rd identical failure short-circuits tooutcome:"BLOCKED"(the constraint armed at the 2nd identical failure was the single bounded recovery act, so the legacy retry-3 Stuck-Recovery is bypassed for identical-failure loops, kept for non-identical stuckness), while budget exhaustion across differing failures isoutcome:"CLARIFICATION_NEEDED". The stop is re-derived from the anti_repeat store IN...
MAP Framework v3.18.0
Fixed
- PyYAML promoted to a hard runtime dependency (closes #245).
pyyamlwas
declared only in thetest/devoptional groups, so a normal install
(uv tool install/pipx/pip install mapify-cliwithout extras) shipped
without PyYAML.project_config.load_map_configthen hitImportError, warned
once, and silently fell back to default config — the user's entire
.map/config.yaml(minimality,profile,compression_policy, thresholds,
language,prompt_layering, …) was ignored by every config-dependent CLI path
(minimality-report,mapify initcompression/sofa overrides). CI never caught
it because the dev/test groups do includepyyaml. Fix addspyyaml>=6.0.0to
[project].dependenciesinpyproject.toml. (The.map/scripts/map_step_runner.py
runner was unaffected — it reads config via a stdlib-only scalar parser — which is
why the defect hid.) Regression tests assertpyyamlis in the runtime dependency
table and that a non-default.map/config.yamlvalue actually loads.
Changed
- Prompt layering resolved as cache-neutral;
docs_firststays the default (closes #231).
The remaining field-gated step of #231 — measuredocs_firstvsstable_first
on a real multi-subtask run, then maybe flip the global default — is resolved
on mechanism, not a fabricated measurement. Anthropic prompt caching writes
a cache entry only at an explicitcache_controlbreakpoint on a content-block
boundary and hits require a byte-identical prefix up to that block; Claude Code's
Task tool owns the API call and all breakpoint placement, and MAP joins its
sections into one user-message string so the stable/variable seam lives mid-block
and can never become a cache boundary. The only byte-identical cross-dispatch
prefix (tools+ role system prompt) is independent ofprompt_layering, so both
modes benefit equally. Thereforestable_firstyields no incremental prefix-cache
hit under the current Claude Code Task architecture. No behavior change: the
global default staysdocs_first;stable_firstremains opt-in and is not a
behavior no-op (it still changes token order/attention) and is never silently
remapped.docs/ARCHITECTURE.md,docs/USAGE.md, theMapConfig.prompt_layering
comment, the generated.map/config.yamlcomment, themap_step_runner.pylayering
comments, andtests/test_prompt_layering.pywere de-overclaimed accordingly, and
re-open triggers were recorded. Notoken_accounting.jsonfigures were fabricated —
per-subagent cacheusageis harness-owned and not observable to MAP for Task
dispatches, which is exactly why an end-to-end run is a poor test of this hypothesis. - Global
minimalitydefault flippedoff→lite(Phase 3, closes #183).
The promotion gate (mapify minimality-report) reachedcandidateand the
manual review gate passed against field telemetry, so the keyless default now
resolves toliteinstead ofoffat BOTH layers:MapConfig.minimality
(src/mapify_cli/config/project_config.py) and the runner's
_load_minimality_level(map_step_runner.py). Projects that omit the key now
get the advisory complexity-lens / minimality doctrine (advisory-only — never a
verdict gate). Opt out withminimality: off. Opt-out hardening: YAML 1.1
parses bareoffas booleanFalse, which the str field previously rejected and
silently dropped to the default — now the loader coerces a booleanminimality
back to theofflevel before type-checking, sominimality: off(quoted or
bare) reliably opts out.generate_default_configalready wroteminimality: litefor new projects, so generated configs are unchanged; only keyless/invalid
fallbacks move fromofftolite. Regression tests pin the new default, the
bare-offopt-out, the invalid→litefallback, run-health stamping, and the
doctrine/lens activation at the lite default.
Added
/map-planStep 0.6: Verify Live/Runtime State gate (#243). A new
depends_on_runtime_stateworkflow-fit signal (6th signal on
record_workflow_fit, defaultfalse; CLI flag--depends-on-runtime-state,
legacy positional path unchanged) arms a gated Step 0.6 between the
Already-Implemented gate (Step 0.5) and decomposition. It is the runtime
analogue of Step 0.5: where 0.5 stops you re-planning code that already
exists, 0.6 stops you planning against runtime facts that have drifted
(prod row counts, enum labels actually present in a live DB, a column that
already exists, the applied migration head, a live feature-flag value).
Each assumption is either verified read-only through an approved source
(replica/dashboard/metadata query — cite the derived fact, never persist
prod rows/PII/secrets into.map/<branch>/artifacts) or recorded as an
Unverified Runtime Assumptionin the spec's Open Questions / Risks with the
exact check to run, with dependent subtasks markedprovisional. The skill is
a planning-time gate, not a runtime tool — it suggests the read-only checks and
defers execution to the operator or an authorized sub-agent; it never
hard-stops merely because prod is unreachable. Mirrored into the Codex
$map-plansurface; detail + examples + safety guardrails live in the bundled
plan-reference.md(the active SKILL body stays under its line budget).
WORKFLOW_FIT_DECISION_SCHEMAgains the optionaldepends_on_runtime_state
boolean (notrequired, so pre-existingworkflow-fit.jsonfiles still
validate). Design pressure-tested via llm-council (deep mode). New regression
tests pin the signal round-trip, the keyword CLI flag, the legacy-positional
default, schema backward-compat, and the gate prose across all rendered
Claude + Codex trees.- Opt-in cache-friendly prompt layering for reviewer fan-out (Part of #231).
.map/config.yamlnow acceptsprompt_layering: docs_first | stable_first
(defaultdocs_first, behavior unchanged).docs_firstkeeps the historical
attention-optimized envelope (variable<documents>first, stable contract
last).stable_firstreorders the stable<task>/<workflow_policy>/
<instructions>/<expected_output>contract ahead of the variable documents
so it forms a byte-identical prefix across repeated same-role Monitor/
Predictor/Evaluator (and complexity-lens) dispatches — the precondition for an
automatic prefix-cache hit._render_review_promptand
_render_complexity_lens_promptroute through a shared_layer_prompt_sections
helper;build_review_promptsreads_load_prompt_layering()and echoes the
active mode asprompt_layeringin its result. Registered + validated on
MapConfigand documented (commented) in the generated config. The
attention-vs-cache tradeoff is unproven, so the default does not flip: it
is gated on a measureddocs_firstvsstable_firstcomparison (incremental
cache_read+ no quality regression) — the measurement recipe and the
harness-owned-dispatch constraint are documented under "Prompt Layering &
Prefix Caching" indocs/ARCHITECTURE.md. The token-accountingcache_read
double-count the issue cited as a measurement caveat was already fixed and is
regression-tested, so the comparison numbers are trustworthy. New
tests/test_prompt_layering.pypins docs_first byte-identity and the
stable_first prefix invariant. - Agent-Boundary Doctrine: written down + every live hand-off audited
independent | relay(#230).docs/ARCHITECTURE.mdnow carries the explicit
criterion — keep a separate sub-agent only when it adds an independent /
adversarial perspective; collapse any pure-relay hop (a context that only
paraphrases a prior agent's output, emitting no new verdict) into its caller.
It is a substance rule, not a wiring rule. The doctrine includes a ground-truth
audit (classified from actualsubagent_type="…"dispatch sites, not docs): all
8 pipeline-dispatched agents emit independent verdicts and none is a relay; the
only relay hops the doctrine condemns — the Self-MoAsynthesizer/debate-arbiter
— were already collapsed in #240. The audit also resolves the orphaned
documentation-reviewer(zero skill dispatch sites) as a deliberate keep: it
emits a unique, non-relay verdict, so it is retained as an optional,
user-dispatchable agent (invoke viaTask(subagent_type="documentation-reviewer"))
and now self-declares aDispatch status:annotation. A new
tests/test_agent_dispatch_audit.pyenforces the invariant going forward: any
agent shipped with no dispatch site and not marked optional fails the gate,
preventing a silent orphan from recurring. - Hand-authored RESEARCH artifacts now self-correct on the first reject, and
the exact contract is documented (#228, follow-up to #197). The documented
save_researchpath ("save direct current-session findings") used to cost 2-3
validate_researchrejects because the strict schema enforced by the validator
(status enum,confidencefloat,search_statsfield names,lines: [start, end]with a ≤200-line span) lived only in code — the SKILL prose implied free
text ("complete","high",files_examined). Now: (1)validate_research
echoes a copy-pasteable, structurally-valid artifact in askeletonfield on
ANY failure (bad JSON, wrong types, or a missing artifact), so the first reject
is self-correcting — copy it, swap your values, re-save; (2) the exact field
table + the same skeleton are documented under "RESEARCH artifact schema" in
the map-efficientefficient-reference.md(Claude and Codex), with the SKILL
RESEARCH section naming the exact status enum and pointing at it. Validator
behavior is unchanged for valid artifacts (noskeletonfield is added). - **Compaction now offloads large tool outputs for on-demand retrieval in...
MAP Framework v3.17.1
[3.17.1] - 2026-06-18
Fixed
- Broken and misleading prose in lower-tier MAP skill prompts (prompt-quality
audit). Repaired shippedSKILL.mddefects surfaced by a PQS audit: the
/map-tddACTOR example carried an unterminatedf"""string (would break on
copy-paste) plus a duplicated<TDD_Tests>placeholder;/map-statedeclared
three conflicting versions (frontmatter1.0.0,metadata3.1.0, footer
1.0.0) — now all3.1.0; the auto-generated Troubleshooting footer in
/map-fast,/map-debug,/map-tdd, and/map-releasereferenced a
non-existent "What this command CANNOT do" section and shipped a
<typical args>placeholder Examples block; the/map-releasevalidation-gate
matrix listed a "Black format" gate thatmake checknever runs (black is
make formatonly); and/map-skill-evalTroubleshooting required a
non-existentidfield on eval-set entries (cell_ids are derived).
Changed
- Strengthened inhibition (NEVER rules) and output contracts in read/write MAP
skills./map-state,/map-tokenreport,/map-memory-now, and
/map-skill-evalgained explicitConstraints (NEVER)blocks (single-writer
enforcement, no directstep_state.json/run-log edits, read-only guarantees,
no auto-persisting secrets or flipping user config) plus fixed output-report
templates and a skill-eval self-check — raising prompt quality without changing
runtime behavior. The/map-debug,/map-fast,/map-tdd, and/map-release
Examples/Troubleshooting sections now reference real sections and real example
invocations.
MAP Framework v3.17.0
Added
/map-understandinteractive learning mode (#221). MAP now ships an
opt-in deep-understanding slash surface for Claude and Codex. It keeps a
transient Markdown checklist in the conversation, teaches code/diffs/workflow
artifacts incrementally, asks restatement or quiz checks without revealing
multiple-choice answers early, and stays separate from normal workflow
verbosity and/map-learnpersistence.- Minimality rollout telemetry can now be inspected before the Phase 3 default
flip (#180/#183).run_health_report.jsonrecords the workflow's historical
minimalitylevel, andmapify minimality-reportcompares completeoffand
opt-in cohorts for retry pressure, guard rework, and deferred-YAGNI reversal
rate before marking the local rollout ascandidate,hold, or
insufficient_data. The report summary now includessample_gaps,
cohort_branches,next_actions, and a candidate-onlymanual_review_gate
with opt-in branches plus a clarity/underscope checklist, so maintainers can
see the exact telemetry, stale historical-minimality branches, and human review
still needed before promotion. - Decomposer pruning is now contract-gated and user-visible (#184).
Blueprints can carryrequiredness/pruneablemetadata per active subtask
and adeferred_yagniparking lot for speculative omissions. The validator
rejects non-emptydeferred_yagniunderminimality: off/lite, requires
explicit REVIEW_PLAN approval warnings underfull/ultra, and Actor context
now preserves approved omissions so they are not silently implemented or lost. - Deferred YAGNI items can be restored before approval (#184).
map_orchestrator.py restore_deferred_yagni YG-NNNmoves one parking-lot
item into active subtasks, appends it to the task plan, and clears prior plan
approval so REVIEW_PLAN cannot proceed on stale scope. - Research-agent localization quality can now be scored deterministically
(#200). Maintainers can parse ResearchEvidence JSON orpath:line[-end]
text citations, validate them against a fixture repo, and compute file-level
plus line-overlap precision/recall/F1 without live provider credentials.
The scorer is exposed asmapify research-eval scoreand covered by the
no-provider E2E artifact-contract suite.
Changed
/map-explainnow respects the user's language and scales depth to target
size (#224). The skill writes prose in the user's established language
(code, identifiers, commands, andfile:linerefs stay English) instead of
always defaulting to English. The rigid always-emit-all-10-sections /
explain-every-line structure is replaced by a signal-first output spec:
size tiers with word-budget ceilings and load-bearing-line caps, a front-loaded
"Mental model in 60 seconds" block, read-tier section tags
([MUST READ]/[READ IF MODIFYING]/[SKIM]), a single load-bearing-lines
table (merging the old "what every line does" + "why each line" sections,
repeated shapes explained once), before→after-first ordering for diffs,
adaptive sections with anOmitted:footer, and natural-language follow-up
offers. Applies to both the Claude and Codex surfaces.- Research artifacts are now unified and consumed before broad search
(#209/#210). Planning and per-subtask research now share a single artifact
shape across/map-plan,/map-efficient, and the research-agent, and Actor
is required to consume the persisted research artifact before launching its
own broad codebase search — enforced bymap_step_runner.pyso research spend
is not duplicated or ignored.
MAP Framework v3.16.0
Added
- Research ROI is now visible in token and run-health diagnostics (#202).
token_accounting.jsonrecords advisoryresearch_roi,/map-tokenreportprints per-agent cost plus research-vs-Actor/Monitor token share, andrun_health_report.jsonsummarizes persisted research artifacts, parsed status/confidence/location counts, low-confidence warnings, and token share.
MAP Framework v3.15.2
[3.15.2] - 2026-06-15
Changed
- Codex
researchernow shares the Clauderesearch-agentResearchEvidence
contract (#198). Codex may use provider-specific search commands internally,
but/map-efficientresearch artifacts now explicitly preserve the same strict
JSON fields, bounded file-line evidence, and downstream Actor/Monitor
semantics across providers.
MAP Framework v3.15.1
[3.15.1] - 2026-06-15
Fixed
/map-efficientnow distinguishes mandatory RESEARCH artifacts from conditional research-agent delegation (#201). Hook hints, Claude/Codex workflow skills, orchestrator validation errors, and docs now tell operators to persist a research artifact before Actor while usingresearch-agent/researcheronly for broad, high-risk, or unclear discovery.
MAP Framework v3.15.0
Added
- MAP RESEARCH artifacts are now validated before Actor work (#197).
validate_researchchecks strict JSON, confidence/status/search stats,
bounded file-line evidence, safe relative paths, and over-broad location lists;
validate_step 2.2now blocks malformed or missing research before Actor can
consume it.