feat(audit): AI-SDLC measurement engine — weighted-category scoring + deterministic engine + 3-tab report (full re-architecture)#139
Conversation
Self-contained, zero-context implementation spec capturing the full approved design (decision log, standards.toml schema, collector/metric contracts, phased task list) plus the CEO/CTO exec-deliverable mock. Supersedes the pre-pivot draft under docs/superpowers/ (git-ignored).
…ighted-category model
…points + reliability
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAWOS audit metadata is bumped to 2.5.0, and the audit engine is expanded with deterministic collectors, registries, detectors, metrics, rollup logic, and HTML/Markdown rendering. Documentation, standards, and tests are updated to match the new weighted-category scoring and provenance flow. ChangesAWOS audit engine refresh
Sequence Diagram(s)sequenceDiagram
participant cli_ts
participant audit_core_ts
participant collectors_ts
participant metrics_ts
participant render_ts
cli_ts->>audit_core_ts: auditCore(repoPath, outDir, DETECTORS, METRICS, standardsPath)
audit_core_ts->>collectors_ts: write collected/*.json
audit_core_ts->>metrics_ts: compute per-dimension JSON
audit_core_ts->>cli_ts: return AuditCoreSummary
cli_ts->>render_ts: renderMarkdown(audit.json) or renderHtml(audit.json)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~90+ minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
tests/lint-prompts.test.js (1)
1238-1241: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winParse only check headings when building dimension blocks.
Line 1239 splits on every
###heading, but the contract is about### CODE-NNcheck blocks. This is brittle if non-check subheadings are added later.Suggested fix
- // Split into check blocks by the "### CODE-NN:" headings. - const blocks = body.split(/^### /m).slice(1); + // Split into check blocks by the "### CODE-NN:" headings. + const blocks = body.split(/^###\s+[A-Z]+-\d+:/m).slice(1);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/lint-prompts.test.js` around lines 1238 - 1241, The block parsing in the lint prompt test is too broad because it splits on every “###” heading instead of only the check-block headings. Update the logic around body.split and the subsequent block-processing loop in the test to match only “### CODE-NN” headings, so extra non-check subheadings are ignored. Keep the rest of the dimension-block extraction flow the same, but ensure the heading filter is specific to the contract enforced by this test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/ai-sdlc-exec-deliverable.md`:
- Line 16: The Markdown ASCII diagram fences in the document are unlabeled,
which breaks lint/tooling consistency; update the fenced blocks in the affected
sections to use a language tag such as text. Make this change for each ASCII
block by editing the relevant fenced blocks in the document so the existing
content stays the same but the fences are explicitly labeled, matching the style
used by other Markdown examples.
In `@docs/design/ai-sdlc-measurement-and-scoring-plan.md`:
- Line 40: The implementation contract in the metrics section uses a
machine-local absolute path, which should be removed for portability. Update the
reference in the metrics/ description to point generically to the existing
complexity scanner using a repo-relative or tool-agnostic identifier instead of
/Users/aleksandrmakarov/code/scripts/complexity-scan.py, while keeping the rest
of the metrics contract unchanged.
In
`@plugins/awos/skills/ai-readiness-audit/references/ai-sdlc-metrics-catalog.md`:
- Line 31: The ADP-G7 revert pattern is too narrow because the `^Revert"` match
misses standard revert subjects like `Revert "..."`, which can undercount
failures in the metrics guidance. Update the revert/rollback pattern in the
ADP-G7 entry of the metrics catalog so it matches the common spaced form used by
revert commits, while keeping the existing hotfix and rollback terms intact.
In `@plugins/awos/skills/ai-readiness-audit/references/data-sources.md`:
- Around line 98-101: The global SKIP rule in data-sources.md conflicts with
metric-specific requirements such as MTTR’s need for a real incident source.
Update the wording around the partial source and SKIP rule sections to state
that the generic fallback applies only unless a metric defines stricter
required-source contracts, and explicitly note that metric-specific rules like
MTTR override the default behavior. Keep the guidance aligned with the existing
metrics/ layer language so readers can tell when a metric should still skip
despite partial source availability.
In `@plugins/awos/skills/ai-readiness-audit/scoring.md`:
- Around line 20-22: Add a language tag to each unlabeled fenced code block in
the scoring markdown so it passes MD040. Update the fenced examples around the
dimension_score, coverage_ratio, and audit_total formulas to use a consistent
annotation such as text, keeping the existing content unchanged. Use the
markdown sections containing those formulas as the target for the fix.
In `@tests/lint-prompts.test.js`:
- Line 1118: The lint prompt test is using an overly broad regex that can match
unrelated words like “degraded” or “upgrade.” Update the assertion in the test
around assert.doesNotMatch in tests/lint-prompts.test.js to use a bounded
“grade” pattern that only matches the standalone word, keeping the intent of
dimension-auditor must not emit a grade while avoiding false failures.
- Around line 1008-1030: The current check in lint-prompts.test.js only verifies
that required keys exist somewhere in standards.toml, so malformed [category.*]
tables can still pass. Update the assertion logic around the existing
[category.*] scan to validate each category table block individually, using the
same symbols/loop in the test, and ensure every table contains all required keys
within its own section rather than relying on file-wide matches.
---
Nitpick comments:
In `@tests/lint-prompts.test.js`:
- Around line 1238-1241: The block parsing in the lint prompt test is too broad
because it splits on every “###” heading instead of only the check-block
headings. Update the logic around body.split and the subsequent block-processing
loop in the test to match only “### CODE-NN” headings, so extra non-check
subheadings are ignored. Keep the rest of the dimension-block extraction flow
the same, but ensure the heading filter is specific to the contract enforced by
this test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 63f87513-b53d-4915-871a-9d0780bf7a55
📒 Files selected for processing (25)
.claude-plugin/marketplace.jsondocs/design/ai-sdlc-exec-deliverable.mddocs/design/ai-sdlc-measurement-and-scoring-plan.mdplugins/awos/.claude-plugin/plugin.jsonplugins/awos/agents/dimension-auditor.mdplugins/awos/skills/ai-readiness-audit/SKILL.mdplugins/awos/skills/ai-readiness-audit/dimensions/ai-development-tooling.mdplugins/awos/skills/ai-readiness-audit/dimensions/code-architecture.mdplugins/awos/skills/ai-readiness-audit/dimensions/documentation.mdplugins/awos/skills/ai-readiness-audit/dimensions/end-to-end-delivery.mdplugins/awos/skills/ai-readiness-audit/dimensions/project-topology.mdplugins/awos/skills/ai-readiness-audit/dimensions/prompt-agent-integrity.mdplugins/awos/skills/ai-readiness-audit/dimensions/quality-assurance.mdplugins/awos/skills/ai-readiness-audit/dimensions/security.mdplugins/awos/skills/ai-readiness-audit/dimensions/software-best-practices.mdplugins/awos/skills/ai-readiness-audit/dimensions/spec-driven-development.mdplugins/awos/skills/ai-readiness-audit/dimensions/supply-chain-security.mdplugins/awos/skills/ai-readiness-audit/output-format.mdplugins/awos/skills/ai-readiness-audit/references/ai-sdlc-metrics-catalog.mdplugins/awos/skills/ai-readiness-audit/references/data-sources.mdplugins/awos/skills/ai-readiness-audit/references/standards.mdplugins/awos/skills/ai-readiness-audit/references/standards.tomlplugins/awos/skills/ai-readiness-audit/report-template.mdplugins/awos/skills/ai-readiness-audit/scoring.mdtests/lint-prompts.test.js
… bundle + CI job) - Install devDeps: typescript, tsx, esbuild, @types/node; runtime: smol-toml - Add tsconfig.json (NodeNext, strict, allowImportingTsExtensions, noEmit) - Add tests/helpers.ts: loadStandards() via smol-toml, writeCollected() fixture helper - Add tests/smoke.test.ts: asserts meta.monthly_bucket_days===30, max_lookback_days===730 - Add scripts/build-engine.mjs: esbuild driver bundling collectors/detectors/metrics entrypoints - Create collectors/, detectors/, metrics/, dist/ scaffold dirs with .gitkeep - Add test:engine and build:engine scripts; fold test:engine into npm test - Add node-engine CI job to quality-check.yml - Add TS scaffold presence lint check to tests/lint-prompts.test.js (56 tests, all pass)
…mplify build-engine
- Add `npm ci` step to the `test` job in quality-check.yml so tsx is available
when `npm test` chains into `test:engine` (fixes MODULE_NOT_FOUND on CI).
- Remove `allowScripts` from package.json (Bun-only convention; npm ignores it).
- Replace dynamic `import('node:fs')` in build-engine.mjs with a direct
`writeFileSync` call, adding it to the existing static import.
…categories + lint + schema test Classifies every [category.*] table in standards.toml with a method field: computed (numeric result), detected (deterministic boolean from regex/glob/AST/config), or judgment (semantic sampling required). The 7 judgment categories additionally carry rubric and evidence_required. Both the JS regex lint (tests/lint-prompts.test.js) and a new TypeScript engine schema test (standards-schema.test.ts) guard the vocabulary and the judgment-requires-rubric contract. standards.md documents the Method section. Prettier clean.
…erminism over LLM sampling)
Adds collectors/git.ts (always-available Tier-G collector) that shells to git via execFileSync to gather default_branch, monthly_buckets, merge_records, revert_merges, total_merges, ai_marked_commits, total_commits, tooling_paths, and numstat_totals. Hermetic node:test suite builds a throwaway git repo with pinned GIT_AUTHOR_DATE / GIT_COMMITTER_DATE for fully deterministic assertions. Also adds plugins/awos/skills/ai-readiness-audit/dist/ to .prettierignore so generated bundles are not flagged by prettier.
… (drop Date.now), remove dead code - buildMonthlyBuckets: window end is now the latest committer date from git (git log --all --format=%cI --max-count=1); since = latestCommitDate − lookback_days. Date.now() is gone — buckets are a pure function of git history + period params. - Removed dead rangeOut call in getMergeRecords (fired nonsensical sha^2..sha^2 range, result was discarded immediately). - Removed unused countLines helper.
…ause syntax fix
- Add detectors/software_best_practices.ts with three detectors:
detectExceptClauseDefect (2706): FAILs on Python-2 `except A, B:` syntax
detectErrorHandling (2704): heuristic over catch/except blocks — FAIL/WARN/PASS
detectLockfiles (2705): PASS if any recognised lockfile present
- Add DETECTORS map: { 2704, 2705, 2706 } → detect functions
- Add [category.sbp_except_clause_syntax] (code 2706, method=detected) to standards.toml
- Update SBP-06 in software-best-practices.md: Category 2704, 2706
- 14 hermetic unit tests; all gates green (39 engine, 56 lint-prompts, build:engine, prettier)
…llect/detect/metric) Replace per-module multi-entrypoint bundling with a single cli.ts that esbuild inlines into dist/cli.js. The dispatcher handles `collect <source> <repoPath>`, `detect <code> <repoPath>`, and `metric <id>` (stubbed, exits non-zero until metric modules land). build-engine.mjs cleans dist/ before building so all stale flat + nested files are removed. Adds hermetic smoke tests in cli.test.ts.
A check whose precondition is absent used to award a vacuous PASS, letting an empty or low-maturity repo read as compliant (dossier 02-metric-range-and-interference.md §A3). Absence now emits SKIP, which excludes the check from the coverage denominator: - ARCH-02: no source files, or no files under recognised layer dirs - SDD-03: no architecture document to match against - SDD-05/SDD-06: no spec directories - SEC-05: no sensitive file types relevant to the stack - SBP-08: no Python source (belt-and-braces with topology.has_python) Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
Restructures the dimension taxonomy so each dimension is a coherent capability area and the report reads industry-standard engineering first, AI-frontier last (scores change; audit is unreleased): - New Delivery Flow dimension: the DORA family moves out of the ai-sdlc-adoption grab-bag — DF-01 deploy frequency, DF-02 lead time, DF-03 PR cycle time, DF-04 change-failure rate, DF-05 review rework, DF-06 rework rate, DF-07 MTTR. - New Descriptors dimension (unscored, rendered last): DESC-01 contributors, DESC-02 churn, DESC-03 complexity, DESC-04 scale, DESC-05 dependency counts. All weight 0 with a neutral INFO badge — size/activity describe a repo, they don't grade it — and the headline Merges/LOC throughput echo moves onto this page. Fixes the cannot-reach-zero saturators (dossier §A2/C2). - Security dimension dissolved: SEC-01/03/05 → Application Security (AS-12 env gitignored, AS-13 env template, AS-14 sensitive-file ignore coverage); SEC-02 → AI Security (AIS-07 agent guardrail hooks); SEC-04 dropped — it duplicated AS-05 (no hardcoded secrets). - End-to-End Delivery dimension dissolved: E2E-01/04 → Software Best Practices (SBP-09 vertical delivery, SBP-10 no orphaned artifacts), E2E-03 → Documentation (DOC-07 spec traceability), E2E-05 → Code Architecture (ARCH-07 cross-layer tooling). Mis-seeded source fields corrected (SBP-09 is a git signal, not a DORA citation). - prompt-agent-integrity renamed to ai-security (PAI-NN → AIS-NN). - Dimension order is data: standards.toml [meta].dimension_order; audit-core stamps order/title/description onto every artifact, aggregate preserves it, and the renderer shows each dimension's description as a hover tooltip on the summary row and dim page. - standards.toml physically regrouped by the new order. Weight spread across scored dimensions is now 27-86 (was 16-139). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
Squash/rebase-merging a PR leaves no merge commit, so every metric built on `git log --merges` silently read 0 on healthy repos, and per-author merge counts credited only whoever clicked the merge button (dossier 03-squash-merge-blind-spot.md). - The git collector now counts squash-merged PRs as merge events: first-parent non-merge trunk commits carrying a forge PR ref — GitHub "Title (#123)", Azure DevOps "Merged PR 123: …", Bitbucket "(pull request #12)", GitLab "See merge request …!45" in the body — attributed to the commit author, which for a squash merge IS the PR author. window_stats gains merge_commits / squash_merges / merge_strategy (merge-commit | squash | mixed | unknown). - Deploy frequency, change-failure rate, and rework rate now measure squash repos correctly (their revert/fix keyword filters cover the squashed subjects too). - Lead time, PR cycle time, and review rework are merge-record proxies that cannot exist without real merge commits: on a squash-strategy repo they SKIP with a connector-pointing reason instead of reporting a confident number from unrepresentative residue. The MTTR git proxy stays included (per its contract) but degrades confidence with an explanatory note. - activeContributors: merge-share now includes squash events, restoring the safety valve on squash repos; when a repo has no merge events at all, commit-share replaces merge-share so the rule can't degenerate to LOC-share-only (the '1 active of 9' collapse). Validated on the dossier's evidence repo: 19 merges credited to one maintainer → 114 merge events across the 4 real PR authors, strategy=squash. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
Report fixes from the barley validation run:
- Headline VALUES now carry tooltips: the underlying check's evidence
(how the number was derived) with the check id + status as meta; an
absent value ('—') explains itself with the check's skip reason
instead of standing bare.
- Metric-routed SKIPs surface the metric's own reliability note (e.g.
'squash-merge workflow: no branch merge records…') instead of the
generic 'required data was not available'.
- Active Contributors tooltip states the real share-based rule (active
unless BOTH merge-share and LOC-share fall below the 5% threshold),
not an invented ≥2-commits heuristic.
- Change-failure definition describes what is actually computed (window
keyword proxy), dropping the unresolved 'within N days' placeholder;
dependency-count definition drops its SKIP boilerplate.
- Spec coverage tooltip is plain language (no 'check SDD-04' jargon).
- SDD-04 denominator is now MERGED feature work (first-parent merge
commits + squash-merged PRs, 90d window, first-parent diffs) — on
repos whose CI deletes branches after merge, live refs undercounted
badly (barley: 11 surviving branches vs ~280 merged PRs). Live-branch
evaluation remains only as a fallback for merge-less workflows.
Orchestration cost (profiled: Step 6 hand-patching was ~35 of 47 serial
model turns):
- New engine verb 'patch-judgment <outDir> <patches.json|->' applies
ALL judgment verdicts in one call — validates ids, refuses
non-judgment checks, clamps scores, derives weight_awarded, and
re-aggregates itself. SKILL.md and the repo-auditor agent now mandate
it (no hand-edited dimension JSONs, no separate aggregate).
- SKILL.md hardens the engine-call budget (one enrich, one
patch-judgment, one render --format both) and forbids interleaving
shell processing between Jira pages.
Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
'sonnet' resolves to the best available Sonnet at run time, so the harness doesn't silently pin audits to an outdated version id. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…eup polling A measured run spent most of its 19 minutes in ScheduleWakeup wait loops polling background Jira/Confluence fetch agents instead of just making the calls. SKILL.md and the repo-auditor agent now state that connector fetches are inline parallel tool calls, the judgment subagent is a single foreground Agent call, and ScheduleWakeup / background tasks / fallback polling are never used in this skill. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
The old definition — fraction of repos with ≥1 reachable data-source collector — always read 100% because git is always reachable, so the portfolio card carried no information. It is now the mean of the per-repo coverage ratios (awarded ÷ applicable weight): how much of the current standard the reachable sources could actually score. On the sample-org run this reads 69% instead of a vacuous 100%. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…esult segments - SKILL.md org mode: the wait-for-subagents barrier is satisfied by the foreground Agent calls returning — never background tasks polled via ScheduleWakeup or filesystem checks. A measured org run doubled its wall time AND cost that way (9 resume segments, each reloading full context uncached). - Harness: a session split into resume segments emits one stream-json result event per segment, and reading only the last one reported a 94-turn / 18m47s hops run as '9 turns / 78s'. stream_run now aggregates across ALL result events (turns/durations summed, cost and usage from the cumulative maximum, is_error OR-ed), measures true start-to-finish wall_ms itself, and records result_segments so a split session is visible in run-meta.json. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
Mindset shift: Coverage (share of the current standard in place) is what
a reader should judge a project by — Points is the raw material.
- The report headline is now '##.#% Coverage' with Points secondary, in
both HTML and Markdown; the tooltip cites the standard's provenance:
'Average software project score among all applicable metrics by
industry standards on <date>', where <date> is the max last_verified
across standards.toml categories, stamped into audit.json as
standards_meta by audit-core.
- Dimensions and Repositories tables swap the Coverage/Points columns
(coverage leads); per-check Points cells show the ratio too:
'1.3/8 (16.3%)'; weight-0 descriptor rows show '—'.
- Active Contributors tooltip interpolates the real threshold from
standards.toml ({threshold} placeholder resolved from standards_meta)
instead of hardcoding '5% by default' prose.
- Org report parity: per-repo reports under per-repo/ get an automatic
'← Back to org report' link (detected from the render out-dir); the
org rollup passes merged source_windows + standards_meta through so
the org header shows the measurement window; Connections & Sources
uses the same Connected/Missed template as a per-repo report with
(n/N) repo counts ('CI runs (3/8)'), and Tech Stack lists org items
the same way; Repositories column headers all carry tooltips.
- SKILL.md: org audit.json carries source_windows/standards_meta
through, and 'project' is the portfolio name only — never the inlined
repo list (the Repositories table already lists them).
Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…ime fix
Org portfolio cards — one weighting, clear names, coverage first:
- All three cards now use the SAME weighting: by active contributors
per repo (a 40-person repo moves the portfolio average more than a
2-person one), falling back to equal weights when counts are missing;
every card's description states which was used.
- Reordered and renamed: 1) 'Standards coverage' (was 'Measurement
coverage') — the contributor-weighted mean of the per-repo coverage
headlines, so the org card and the per-repo headline are explicitly
the same concept; 2) 'Capability score' (now contributor-weighted);
3) 'Repos with AI tooling' with the plain 'X of Y repositories' count
in its tooltip. The per-repo headline is renamed to
'##.#% Standards coverage' too (table columns keep the short
'Coverage').
Tooltip quality — written for a non-technical reader:
- Org headline matrix labels ('Merges / active contributor',
'LOC / active contributor') had no dictionary entry and echoed the
label as its own tooltip; they and the Repositories column headers
now carry 2-3 plain sentences each.
- Every metric-label tooltip ends with 'Standard last verified <date>'
— per-check dates from the category's last_verified (now carried on
CheckRecord), the overall standards date elsewhere.
- Every Coverage tooltip cites 'industry standards on <date>',
including the org card and both report headlines.
Misleading skip reasons (QA-session finding): buildSkipReason let the
'connect a <source>' template win over applicability because the
pseudo-source 'audit' counted as connectable — a TypeScript library
showed ~16 checks demanding 'connect a audit source' when the truth was
'this repo has no web app'. Applicability now wins, only real connector
sources (tracker/docs/ci/incident) prompt for a connector, and the
article agrees ('an incident source').
Org-mode cycle time (investigated in a separate session): the
repo-auditor agent's 'tools:' frontmatter restricted it to file tools,
so per-repo audits structurally could not fetch Jira/Confluence via MCP
— every org repo showed 'no tracker connector provided' even when the
orchestrator's own probe reached Jira. The restriction is removed (the
agent inherits the full toolset), the connector shape gains an optional
in_progress_at (Jira: expand=changelog) so the cycle-time headline can
actually compute median in-progress→done, and SKILL.md explains the
computation. MTTR behavior confirmed correct (incident connector only).
SKILL authoring hygiene: org reach 'ai_tooling' summarises with counts
only (never repo-name enumeration); recommendations must cite the check
they actually remediate; every authored ratio renders as a percentage
with one decimal, never a raw float.
Harness: claude -p sessions see the operator's user-scope MCP servers
even with --setting-sources project, so a test audit could pull live
Jira data. The harness now passes --strict-mcp-config by default
(--allow-user-mcp opts out).
Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…s --target The harness previously died on a non-git --target, so org-mode audits could only be launched by hand — bypassing the blank-slate phase prep and engine-compliance guard (which is how a same-day rerun silently re-presented a previous engine's artifacts). Org folders are now first-class: children with .git are enumerated, and provenance records the repo count instead of a target commit. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…confidently A broken measurement must read as "couldn't measure", never as a measured verdict. audit_core: a metric that throws or an unknown metric= id now routes its categories to SKIP with a metric-error note and confidence 0 (was: silent FAIL at confidence 1 with empty evidence); aggregate() treats any finite score as explicit so a patched score of 0 is no longer re-inflated to the status default; patchJudgments rejects statuses outside PASS/WARN/FAIL/SKIP; unparseable dimension JSON and unreadable prior audit.json are logged instead of silently dropped; corrupted collector artifacts report "unreadable", not "not found"; coverage is null (not 0) when no weight is applicable; CI platform naming defers to ciPlatformName (single source of truth); reliability confidence unified on HIGH/MED/LOW. git collector: collect() probes `git rev-parse --git-dir` first and emits available:false with the real error on broken environments (was: available:true with all-zero stats scored at full confidence); run() classifies expected-empty exit 1 (silent) vs fatal 128/ENOENT/EACCES/ENOBUFS (logged + artifact flipped unavailable); commit-less repos no longer spam breadcrumbs. AI-attribution grep runs under --extended-regexp so the (Windsurf|Cascade) alternation matches. ci collector: connector-only path no longer claims a CI config was detected. languages: evidence labels derive from the extensions actually matched. New Bitbucket "(pull request #N)" squash fixture. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
Empirically verified false signals, each now pinned by a regression test: .env pattern no longer matches process.env/os.environ (AIS-07 false PASS); gitignore check accepts .env* / .env*.local (Next.js/CRA defaults); dead \b-before-@ regexes fixed so @app.route/@RestController/express()/FastAPI() register (DOC-03 wrong SKIP, validation/rate-limit detectors); catch-opener matches K&R `} catch (`; PEP 508 env markers no longer read as the literal "$1" (unpinned); RAW_SQL requires SQL continuations so UI copy like "Delete item" stops failing ARCH-04; mutation routes require a router-ish receiver (cache.delete/axios.post excluded); RSpec/Jasmine spec/ dirs no longer earn SDD-04 spec-driven credit; prose "go"/"node" no longer count as tech mentions (SDD-03); import-graph layering WARNs on 1 violation and FAILs on 2+ as documented; single-token filenames count as compatible with every lowercase convention; single-service repos SKIP the per-service-README check; detached- HEAD pseudo-branch entries filtered; schema/namespace hosts exempt from the TLS origin count; md5/sha1 flag requires password context, not any "hash"; pandas/numpy alone no longer classify a repo as an ML project. iterFiles: 512 MB maxBuffer (ENOBUFS on big monorepos silently flipped topology flags off), path-containing globs now work via find -path, and the promised JS-walk fallback exists. makeResult clamps score/confidence to [0,1]. Message-less assert.ok calls in the PAI/QA test files now name their contracts. Unused RANGED_RX deleted. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…e/MTTR Three metrics hardcoded score 1.0 and awarded full weight regardless of the measured value: review rework (DF-05) now bands avg commits/PR, issue throughput (ADP-18) scores 0 at zero resolved and bands resolved/week, pipeline duration (ADP-16) SKIPs on null and bands the duration. CI pass rate guards empty runs (0/0 NaN poisoned audit_total). MTTR keeps the git proxy's minimal reliability even when an incident source exists — the value never came from incident data. Sub-task split averages over all parent-eligible tickets so the best-case anchor is reachable and one epic can't dominate. Doc coverage loses its 0.8/0.6 award cliff (continuous score modulates the weight). Tooling depth path matching is boundary-safe (.awos-legacy no longer matches .awos). Shared readArtifact() in metrics/_base.ts: every metric now degrades to SKIP with the parse error in its note when a collector artifact is corrupt, instead of throwing into a confident FAIL. AST init failures carry the real error into the SKIP note. categories_awarded typed number[]. Org report: rollup carries each repo's cycle_time and mttr display values into the portfolio per_repo rows (the Repositories table rendered these columns 100% blank because the fields were never populated). PR cycle time prefers tracker tickets with changelog-derived in_progress_at -> resolved_at intervals and falls back to the git branch-lifetime proxy — which also rescues squash-merge repos; connector-shapes.md documents the field. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…isibility Markdown delivery table no longer prints the literal "undefined" for a valueless ungated row; the evidence tooltip drops its double HTML-escape (tip() already escapes); a new mdCell() helper escapes pipes/newlines in every untrusted Markdown table cell so an LLM-authored title can't corrupt the table; the org Repositories table renders the newly-carried cycle_time/mttr values; Merges/LOC tooltips say "per week" to match the displayed rate; the header schema comment documents every top-level field the orchestrator must preserve and points at the design doc that exists. PENDING_JUDGMENT is counted and surfaced explicitly in both report formats (amber chip in HTML, suffix in MD) instead of masquerading as SKIP — an unpatched headless run is now visibly unfinished. Unknown statuses warn on stderr. cli.ts: local AuditJson duplicate renamed to a parse-boundary ParsedAudit; usage strings list all 11 verbs. New adversarial escaping suite (hostile strings through evidence, insights, recommendations, repo names — pins single-escaping and intact MD tables) and patch-judgment/render CLI dispatch tests (stdin "-", invalid JSON, bad --format, missing --out-dir). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
Nothing typechecked the engine (tsx and esbuild both strip types), and the JSON-artifact shapes were declared independently in audit_core, render, cli, and org_rollup — with live drift: render's status union lacked INFO and PENDING_JUDGMENT, cli redeclared AuditJson, render read undeclared fields through index-signature casts. `npm run typecheck` (tsc --noEmit) now runs in CI next to the dist gate, and a new artifact_types.ts is the single source for CheckStatus, reliability vocabularies, Check/DimensionArtifact/AuditJson/ PerRepoSummary/OrgConnections (type-only imports, zero bundle cost). The typecheck immediately caught a real bug: `cli.js collect tracker <repo>` passed the git tunables object as the connector payload, fabricating an available:true tracker artifact with zero tickets; the collector registry is now typed so only git receives its options. CI dist gate stages the directory first (git add -A + diff --cached) so an untracked build output — a newly bundled grammar — can no longer slip past the sync check. dist/ rebuilt from the fixed sources. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
plugins/awos/README.md described the removed architecture — per-dimension context windows, a PASS/WARN/FAIL deduction table, A-F grades, an 8-dimension list; rewritten from the engine model (13 dimensions, additive weighted scoring, JSON artifacts, both reports always rendered). CLAUDE.md corrected: patch-judgment added to the verb list, dist gate described as currently non-blocking, injection described truthfully (present but does not fire in plugin skills — Step 5 is the mechanism), four test layers, no more "3-tab"/per-dimension-phase/history annotations. ai-sdlc-adoption.md check ids now match what the engine emits (ADP-01..06, ADP-14..25 — the doc used a private ADP-G*/C*/I* scheme a report reader could never find). SKILL.md: contiguous step numbering, "no per-dimension fan-out" qualified (org repo-auditors and the judgment pass are sanctioned), one bold rule kept, org headline figures copied from the rollup so the header can't disagree with the portfolio cards. Judgment arrays are written inside the audit output dir — repo-auditor.md previously had every concurrent org auditor share /tmp/judgments.json, letting one repo's verdicts patch another. Dissolved end-to-end-delivery/security references replaced; detectors/ and metrics/ READMEs match the real registry and result shapes; ORG-02/ORG-03 standards definitions match the contributor-weighted rollup; harness README documents the per-repo/<repo>/ layout, org-folder --target, --model and --allow-user-mcp. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
A claude run that exited non-zero or errored printed "run complete" and exited 0 while the fallback render polished its partial artifacts; it now prints an INCOMPLETE banner, records partial:true and judgments_patched in run-meta, and exits 1. Engine compliance counts actual audit-core tool_use events (plus the executed-injection marker) instead of substring hits the prompt text itself satisfies. Marketplace repoint failures die with the captured stderr and restore failures warn loudly; restore writes each config file's own original value instead of km_source into both. Org repo count globs per-repo/*/audit.json (was per-repo/*.json — always 0). standards-linkcheck classifies a deep link that redirects to the site root as DEAD (the most common way doc links die previously passed). build-engine exits 1 when the core wasm or any bundled grammar is missing instead of warning and shipping a broken bundle. lint-prompts assertions that could not fail now pin real content: the cli render invocation, the no-individual- attribution privacy sentence, additive weighted points, org-portfolio.json. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
The pr-review/pr-comments-address skills write their drafts and plans to review/; ignore the folder so a local review trail can't ride into a PR. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
… honest gated notes, org-parent guard Four org-report defects from the provectus-barhopping runs, root-caused against the live Atlassian MCP: Gated rows lied about the cause: "— (needs ticketing connector)" rendered even when Jira was connected but tickets lacked status-transition history. DeliveryMetric gains an optional note rendered as "— (<note>)"; SKILL.md instructs authoring the real state (history not fetched / zero resolved in window / partial fetch) and forbids the needs-connector default when a tracker was reachable. Cycle time and MTTR now carry reader-grade tooltips on the headline rows and the org Repositories column headers, explaining exactly what data the metric needs and why it can be gated with a tracker connected. The g5 squash-repo SKIP reason names the changelog gap precisely. The Jira MCP provides everything the metrics need — the prompts just never asked for it: searchJiraIssuesUsingJql caps at 100/page (the "exactly 100 tickets" and the run-to-run count drift) and paginates via nextPageToken; changelogs come only from per-issue getJiraIssue(expand: "changelog"). connector-shapes.md now carries the concrete recipe: paginate to completion with stable ordering + computeIssueCount, then a parallel changelog pass over the ~50 most recently resolved tickets, mapping in_progress_at to the first transition into an In-Progress-CATEGORY status (verified: tickets go Backlog -> To Do -> In Review -> Done without a literal "In Progress"). Tracker artifacts must record a fetch_meta completeness block; every tracker-consuming metric surfaces "partial tracker fetch: X of Y tickets" in its reliability note. repo-auditor.md states both requirements inline so org runs stop shipping single-page fetches. judgments_patched=false root cause: the pre-scope audit-core pass audited the non-git org PARENT folder, leaving an unpatchable stray audit at the output root. audit-core now detects an org parent (not a work tree, >=2 immediate git children), prints why, and exits 0 without writing artifacts. Org headline reach.contributors is pinned to the single-repo shape "<active> active (of <total> in window, 90d)" (sums across repos) instead of the improvised "39 active across 8 repos". Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
What the skill does and the engine/orchestrator split, standards.toml as the scoring model in data, headless testing via the audit-test-harness, and the maintenance paths (standards-refresh skill, adding dimensions/metrics, prompt edits, versioning). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
Root scripts/ is in the installer copy table (scripts/ -> .awos/scripts/, overwritten on update), so build-engine.mjs and the standards-linkcheck pair were being copied into every user's project despite being maintainer-only tooling — users run the prebuilt dist/ and never lint standards.toml. Moved all three to tools/ai-readiness-audit/ (dev-only, never copied), fixed their internal repo-root/default-path resolution for the new depth, and wired the linkcheck tests into npm test as test:audit-tools (previously they matched no test glob and only ran by hand). create-spec-directory.sh stays — it is genuine framework product. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
run_audit_test.py and compare_audit_runs.py become run_audit_test.ts / compare_audit_runs.ts (plus harness_lib.ts for the pure helpers), run via the repo's existing tsx toolchain — same language as the engine, shared shapes, one less runtime prerequisite. All CLI flags, the marketplace repoint/finally-restore, phase seeding/stashing, the engine-compliance guard with retry/salvage, partial-run detection, token accounting across result segments, and the archive/run-meta layout are ported unchanged so old and new runs stay comparable. One deliberate fix over a literal port: die() throws instead of process.exit, which in Node would skip the finally block and leave the marketplace pointed at a worktree. New: a live progress log ([MmSSs]-stamped Bash/Agent tool events, the skill's progress emissions, a 60s heartbeat; --quiet to silence), and a final summary block with wall time (NmSSs), token counts, cost in dollars, turns, compliance/judgments verdicts, and the absolute archived report.html path(s) (org + per-repo). run-meta.json gains report_html. Invoke via npm run audit:test / audit:compare; pure helpers covered by tools/audit-test-harness/ harness.test.ts wired in as test:harness. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF
…anaged artifacts The headless orchestrator repeatedly skipped the deterministic audit-core pass and hand-computed the audit (barley 2026-07-03: 3 attempts, ~45 min, ~3x cost), citing SKILL.md's dead load-time pre-run narrative to justify it. Fix the prompt at the root and make the skip impossible to complete: - SKILL.md: delete the never-executing load-time !`...` injection and every "run audit-core only if audit.json is missing" conditional; Step 5 is now the unconditional first scoring action (pre-existing audit.json is stale output to overwrite), with the prohibition stated at the decision point. Frontmatter disallows Edit/NotebookEdit/ScheduleWakeup while the skill is active (verified enforced for plugin skills on Claude Code 2.1.199). - Engine provenance: audit-core stamps audit.json and every dimension JSON with engine.generated_by; patch-judgment, patch-report, and render refuse an unstamped single-repo audit, and rollup skips unstamped per-repo audits — a hand-assembled audit cannot become a report. - Artifacts are engine-managed end to end: new report-context verb emits the flattened authoring context (check values/hints, git window stats, tracker fetch meta) and new patch-report verb merges the authored headline/ insights/recommendations into audit.json and emits recommendations.md from the same array; the audit-core/enrich summary lists pending_judgment_checks. The orchestrator authors only judgments.json and report-blocks.json — no inline-script inspection or direct artifact edits remain in the flow. - Tests: provenance regression suite, patch-report/report-context CLI tests, lint contracts pinning the no-pre-run/stale/provenance/disallowed-tools wording; dist/ rebuilt. Verified: compliance smoke 3/3 PASS (audit-core called, provenance intact, judgments patched, reports rendered, zero hand-compute/hand-writes/fan-out). Claude-Session: https://claude.ai/code/session_019HGet8jGBoU9gT5wrYZ2Fp
…readiness-audit/qa - Move the QA harness from tools/audit-test-harness/ to tools/ai-readiness-audit/qa/ so one folder holds both the engine build tooling and the QA tooling for the one audit command. - New compliance_smoke.ts (npm run audit:smoke): N headless claude -p runs against a tiny generated fixture repo, verdict per run from hard signals only — audit-core invoked, provenance stamp present, judgments patched, reports rendered (not hand-written), no python/node inline compute, no scoring-JSON hand writes (judgments.json/report-blocks.json exempt), no per-dimension fan-out, no stall-on-question. Fail-fast by default: stop at the first failing run instead of paying for the same failure again (--keep-going for deliberate rate measurement). - Full harness: engine-skip retries now append a corrective system prompt (a bare relaunch re-confabulates the same skip from leftover artifacts); the spoofable injected_audit_core compliance signal is removed — only an actual audit-core Bash invocation counts. - Marketplace repoint helpers extracted to harness_lib.ts (shared by the harness and the smoke tool); unit tests for the new transcript signals. Claude-Session: https://claude.ai/code/session_019HGet8jGBoU9gT5wrYZ2Fp
…urce-probe transparency
Root cause of the missing Cycle time: harness isolation (--strict-mcp-config)
stripped ALL MCP servers, and a separate 2026-07-02 run fetched 994 Jira
tickets without changelogs yet still rendered the default "needs ticketing
connector" next to "Connected: Jira via Atlassian MCP". Three fixes, one
principle — the audit assesses the project, not the auditor's environment:
- Project-scope MCP discovery (harness): the target's own declared servers
(.mcp.json / mcp.json / .vscode/mcp.json / .cursor/mcp.json, org folder +
every repo subdir, VS Code {servers} shape normalized, collisions suffixed)
are merged and passed back explicitly via --mcp-config, which
--strict-mcp-config honors. User-scope servers stay excluded by design.
- Engine-derived gated rows: audit-core/enrich/aggregate compute
audit.derived_delivery (cycle-time median from tracker tickets'
in_progress_at→resolved_at, plus the honest gated note like "Jira connected
— per-ticket status history not fetched") and the renderer appends those
rows, ignoring authored ones — the headline can never contradict the
Connections & Sources section again.
- Source-probe transparency: a new source_probes report block (patch-report)
records what was searched per unreachable source (mcp configs, CLIs, auth
state) and renders into "Missed / limited", replacing the bare "supply a
connector" hint.
- CLI channels (skill): acli/gh/glab are sanctioned measurement channels —
gh run list fills collected/ci.json (barley now scores real pipeline
metrics), code-host issues can serve as a minimal tracker, and the acli
Jira recipe derives the project key from commit-message ticket prefixes;
recipes in connector-shapes.md → "CLI channels".
- Harness robustness: marketplace repoint snapshots/writes the whole source
object (a github-shaped entry mutated only via .path produced a
"corrupted installLocation" rejection); smoke exempts sanctioned
collected/*.json connector writes.
Verified on barley: CI connected via gh (200 runs), tracker/docs missed with
full probe trails in the report, derived_delivery consistent with sources,
engine-compliant run end to end.
Claude-Session: https://claude.ai/code/session_019HGet8jGBoU9gT5wrYZ2Fp
…harness Net -835 lines with zero behavior change (1288 tests green). Applies the findings of the four-angle quality review (reuse, simplification, efficiency, altitude): Shared infrastructure - metrics/_base: skipMetric() replaces ~45 copy-pasted SKIP stanzas; makeMetricResult tail folded into an options object; squash-merge circuit-breaker note/reliability shared; readArtifact memoized (mtime+size) so git/tracker JSON parses once per pass, not 11x/6x; evaluateAppliesWhen() is the single applies_when interpreter. - metrics/_score: shared median/mean/round1; clamp01 reused everywhere. - detectors/_base: per-(repo, ignore-set) cached file listing replaces ~200-300 find spawns per pass; readTextSafe() replaces ~50 hand-rolled try/readFileSync blocks; hasMatch() gives boolean greps an early exit; DetectorResult.status typed as the PASS/WARN/FAIL/SKIP union. audit_core / cli - sources/source_windows/topology derive from one parsed-artifact map shared by auditCore and aggregate (the two copies had already drifted); patch verbs (aggregate, patch-judgment, patch-report, report-context) split into audit_patch.ts; dimensionFiles() iterator shared. - Registries move to detectors/index.ts and metrics/index.ts; cli.ts is dispatch-only (fail()/readJsonArg()/standardsTomlPath() helpers, merged audit-core/enrich cases, static fs imports). - metric verb derives its collectors from standards.toml sources instead of hardcoded id prefixes; org rollup reader moves to metrics/rollup_input.ts with the delivery check-id table derived from DELIVERY_SPECS and AI-tooling codes derived from standards. check_id single source of truth - Every standards.toml category now carries check_id (62 injected from the dimension .md headings); runtime md parsing (parseCheckIds) is deleted; Layer-1 lint enforces presence and md<->toml agreement. Perf - enrich reuses repo-derived checks (detectors, AST metrics) from the per-dimension artifacts and re-scores only connector-affected categories: an enrich pass drops from a full re-audit to ~50ms on a small repo. - git collector: latestCommitDate computed once per collect (was 3 spawns); one all-history squash-merge scan folded in memory for both the unbounded and windowed consumers (was 2 full log passes). - topology: duplicate flag expressions computed once; boolean code greps early-exit. render - renderHtml's ~940-line closure split into top-level section functions; md/html micro-duplications share helpers; canonical COLLECTOR_SOURCES lives in artifact_types.ts. Tests / QA harness - Shared fixture factories in tests/helpers.ts (gitRaw, trackerArtifact, makeCheck/makeDim/makeAudit, makeCheckRecord, tmpDir, gitAs); the ~59 repeated git-raw literals collapse; the two real-repo auditCore shape tests share one run. - QA harness: single stream-json transcript walker, shared claude-spawn wrapper, one-spawn salvage render (--format both), performRun() replaces 8 mutable outer variables, dead exports dropped. CLAUDE.md updated: adding a dimension is four touch points (md, toml, detector module, registry index), and enrich's reuse semantics are documented. Claude-Session: https://claude.ai/code/session_01Ld2EFkQ3DuoFGfvXXqLKZF
…r, short gated cycle-time note Findings from the 2026-07-03 provectus-barhopping org QA run: - Org report rendered a 96-row Dimensions table (12 dims x 8 repos). Root cause was SKILL.md itself: the org-assembly step instructed the model to include "dimensions (aggregated dimension data from all per-repo audits)" in org-portfolio.json, and the renderer rendered whatever it was given. SKILL.md now forbids the key, AuditJson.dimensions is optional (absent on org portfolio JSON), and both renderers ignore top-level dimensions in org mode - an injected concatenation can no longer become a report table. - Same run audited repos 2-3x: three repos got two repo-auditor subagents each, and the orchestrator additionally ran audit-core/enrich for five repos in its own context. SKILL.md org branch now states each repo is audited exactly once by exactly one subagent, the orchestrator never runs the engine itself in org mode, and re-dispatch is allowed only for a repo whose audit.json is missing after its subagent returned. - Per-repo HTML back-link now targets ../../report.html#repos (the org Repositories heading carries id="repos"), returning the reader to the table they navigated from instead of the org report top. - A gated tracker headline row with a connected-but-unmeasurable tracker now shows the short "- (no tickets data)" placeholder; the full explanation moved to the value tooltip (HTML) and the Connections & Sources tracker line (both formats). Tests: org-mode no-Dimensions contract (injected dims ignored; renders without the key), #repos anchor + per-repo back-link, short-placeholder + full-note placement, hostile-escaping fixture split into single-repo and org variants. dist/ rebuilt. Claude-Session: https://claude.ai/code/session_01Ld2EFkQ3DuoFGfvXXqLKZF
…pecheck The Missed/limited probe-log fixture omitted history_available_days, which SourceSummary requires; local runs passed because tsx does not typecheck, but CI's tsc --noEmit gate does. Claude-Session: https://claude.ai/code/session_01Ld2EFkQ3DuoFGfvXXqLKZF
Re-architects
/awos:ai-readiness-auditfrom a fixed-ceiling A–F/0–100 grade into an additive, weighted, file-defined capability model with a deterministic measurement engine, and adds a single-page drill-down HTML report. (Supersedes the original Phase 0+A scope of this PR — now the full re-architecture.)What changed
references/standards.tomldefines every capability category (code/weight/definition/applies_when/source). Score = Σ awarded weights (uncapped) + a coverage % "relative to today's standard". No grade, no 0–100.method:computed/detectedare evaluated by a deterministic TypeScript engine (the auditor uses the verdict verbatim); only genuinely-semanticjudgmentchecks use an LLM against a fixed rubric. This eliminates the ~40-point run-to-run variance the old LLM-judged audit produced.dist/cli.js(verbs:collect/detect/metric/standards/render/rollup/audit-core/aggregate/enrich/progress) — runs with plainnode, no install at audit time. Layers: 4 collectors (git/ci/tracker/docs) · per-dimension detectors · ADP metrics (DORA-style: lead time, deploy freq, change-fail, MTTR git-proxy, AI-attribution, work-mix, …) · complexity/scale via bundled web-tree-sitter (multi-language).audit-corewrites per-dimension JSON +audit.jsonin one pass;report.md+ the self-contained single-pagereport.html(hover hint on every number, hash-routed drill-down per dimension) are rendered deterministically from JSON — nothing dropped.Metric correctness & taxonomy restructure
Min/max fixture testing surfaced three classes of metric defects (evidence dossier:
docs/audit-metric-issues/). All fixed in four phases, with the guiding acceptance criterion that every scored metric must reach 0 on a worst-case project and its max on a best-case one:context/audits/output (self-pollution inflated every run by +12 pts); orchestrator score patches are clamped to [0,1] and reconciled with status; DOC-06 carries its own evidence line; duplicate SBP-06 check id split; object values render ask=v, never[object Object].[meta].dimension_order): industry-standard engineering first, AI-frontier last, descriptors at the end; each dimension's description renders as a hover tooltip on the report summary. Weight spread across scored dimensions went from 16–139 to 27–86.merge-commit/squash/mixed; merge-record proxies (lead time, PR cycle, review rework) SKIP with a connector-pointing reason on squash repos instead of reporting confident wrong numbers. On the real evidence repo this turned "19 merges, all one maintainer" into 114 merge events across the 4 actual PR authors.Validation
Full gate green: engine 1002 / lint 83 / installer 42 / fixtures 5, prettier + build clean. Run end-to-end on a real repo (onex-discovery-api): produces the full JSON artifact set + report.html; SBP-08 caught 2 real
except A,B:syntax bugs; squash-merge detection validated against the same repo's Azure DevOps history.Notes
dist/is ~24 MB (broad web-tree-sitter grammar set, by request) — trimmable to a core set later.SKILL.mdpreflights/invokesnodeas the supported runtime; the bundleddist/cli.jsalso smoke-runs under Bun (incl. the tree-sitter wasm path), but the dev toolchain (node:test/tsx/esbuild) is Node-only.${CLAUDE_SKILL_DIR}resolves the bundled CLI at audit time.awos-qamin/max fixtures for the new check ids (DF-, DESC-, AIS-*, AS-12..14, SBP-08..10, DOC-07, ARCH-07) to prove the 0→max criterion end-to-end.See https://provectus.slack.com/archives/C09GCR80NC8/p1782392880789469 for more details.