feat(audit): AI-SDLC measurement engine — weighted-category scoring + deterministic engine + 3-tab report (full re-architecture) by AlexanderMakarov · Pull Request #139 · provectus/awos

AlexanderMakarov · 2026-06-24T05:17:34Z

Re-architects /awos:ai-readiness-audit from a fixed-ceiling A–F/0–100 grade into an additive, weighted, file-defined capability model with a deterministic measurement engine, and adds a single-page drill-down HTML report. (Supersedes the original Phase 0+A scope of this PR — now the full re-architecture.)

What changed

Weighted-category scoring (Phase A): references/standards.toml defines every capability category (code/weight/definition/applies_when/source). Score = Σ awarded weights (uncapped) + a coverage % "relative to today's standard". No grade, no 0–100.
Determinism — the headline fix. Every category declares a method: computed/detected are evaluated by a deterministic TypeScript engine (the auditor uses the verdict verbatim); only genuinely-semantic judgment checks use an LLM against a fixed rubric. This eliminates the ~40-point run-to-run variance the old LLM-judged audit produced.
TypeScript engine bundled by esbuild to a single committed dist/cli.js (verbs: collect/detect/metric/standards/render/rollup/audit-core/aggregate/enrich/progress) — runs with plain node, no install at audit time. Layers: 4 collectors (git/ci/tracker/docs) · per-dimension detectors · ADP metrics (DORA-style: lead time, deploy freq, change-fail, MTTR git-proxy, AI-attribution, work-mix, …) · complexity/scale via bundled web-tree-sitter (multi-language).
JSON is the source of truth. audit-core writes per-dimension JSON + audit.json in one pass; report.md + the self-contained single-page report.html (hover hint on every number, hash-routed drill-down per dimension) are rendered deterministically from JSON — nothing dropped.
Org mode (multi-repo discover → portfolio metrics + rollup), monthly history (value_series), progress/ETA (interactive + headless), headless-first (auto-generates HTML).
Plugin + marketplace bumped to 2.3.0.

Metric correctness & taxonomy restructure

Min/max fixture testing surfaced three classes of metric defects (evidence dossier: docs/audit-metric-issues/). All fixed in four phases, with the guiding acceptance criterion that every scored metric must reach 0 on a worst-case project and its max on a best-case one:

Explicit bugs — vitest/supertest no longer classify unit suites as e2e (QA-05); test files excluded from the ARCH-05 naming check; the audit no longer scans its own context/audits/ output (self-pollution inflated every run by +12 pts); orchestrator score patches are clamped to [0,1] and reconciled with status; DOC-06 carries its own evidence line; duplicate SBP-06 check id split; object values render as k=v, never [object Object].
SKIP-on-absence — ARCH-02, SDD-03/05/06, SEC-05, SBP-08 emit SKIP (excluded from coverage) when their precondition is absent, so an empty repo no longer reads as compliant.
Taxonomy restructure — new Delivery Flow dimension (the DORA family: DF-01..07, out of the AI-SDLC Adoption grab-bag) and a new unscored Descriptors dimension (contributors/churn/complexity/scale/deps, INFO badge, weight 0 — size describes a repo, it doesn't grade it). The broad Security dimension dissolved into Application Security (AS-12..14) and AI Security (renamed from Prompt & Agent Integrity, AIS-01..07); End-to-End Delivery dissolved into SBP/Documentation/Code Architecture (SBP-09/10, DOC-07, ARCH-07). Dimension order is data ([meta].dimension_order): industry-standard engineering first, AI-frontier last, descriptors at the end; each dimension's description renders as a hover tooltip on the report summary. Weight spread across scored dimensions went from 16–139 to 27–86.
Squash-merge awareness — squash/rebase-merged PRs (GitHub/Azure DevOps/Bitbucket/GitLab formats) now count as merge events attributed to the PR author, with the workflow detected as merge-commit/squash/mixed; merge-record proxies (lead time, PR cycle, review rework) SKIP with a connector-pointing reason on squash repos instead of reporting confident wrong numbers. On the real evidence repo this turned "19 merges, all one maintainer" into 114 merge events across the 4 actual PR authors.

Validation

Full gate green: engine 1002 / lint 83 / installer 42 / fixtures 5, prettier + build clean. Run end-to-end on a real repo (onex-discovery-api): produces the full JSON artifact set + report.html; SBP-08 caught 2 real except A,B: syntax bugs; squash-merge detection validated against the same repo's Azure DevOps history.

Notes

Committed dist/ is ~24 MB (broad web-tree-sitter grammar set, by request) — trimmable to a core set later.
Engine is validated on Node and SKILL.md preflights/invokes node as the supported runtime; the bundled dist/cli.js also smoke-runs under Bun (incl. the tree-sitter wasm path), but the dev toolchain (node:test/tsx/esbuild) is Node-only. ${CLAUDE_SKILL_DIR} resolves the bundled CLI at audit time.
Follow-up: refresh the awos-qa min/max fixtures for the new check ids (DF-, DESC-, AIS-*, AS-12..14, SBP-08..10, DOC-07, ARCH-07) to prove the 0→max criterion end-to-end.

See https://provectus.slack.com/archives/C09GCR80NC8/p1782392880789469 for more details.

Self-contained, zero-context implementation spec capturing the full approved design (decision log, standards.toml schema, collector/metric contracts, phased task list) plus the CEO/CTO exec-deliverable mock. Supersedes the pre-pivot draft under docs/superpowers/ (git-ignored).

…ds model

…ighted-category model

…e history

…ility)

…egories

…points + reliability

…ategory

…p A-F

…coring)

coderabbitai · 2026-06-24T05:17:48Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

AWOS audit metadata is bumped to 2.5.0, and the audit engine is expanded with deterministic collectors, registries, detectors, metrics, rollup logic, and HTML/Markdown rendering. Documentation, standards, and tests are updated to match the new weighted-category scoring and provenance flow.

Changes

AWOS audit engine refresh

Layer / File(s)	Summary
Release framing, scoring model, and build wiring `.claude-plugin/marketplace.json`, `plugins/awos/.claude-plugin/plugin.json`, `package.json`, `.prettierignore`, `.gitattributes`, `.gitignore`, `.github/workflows/quality-check.yml`, `docs/design/*`, `plugins/awos/README.md`, `CLAUDE.md`, `plugins/awos/skills/ai-readiness-audit/{SKILL.md,scoring.md,output-format.md,report-template.md}`, `docs/design/2026-06-25-audit-hardening-design.md`, `docs/design/2026-06-26-audit-hardening-plan.md`, `docs/design/2026-06-26-audit-fairness-and-report-v2-design.md`, `docs/design/2026-06-26-audit-fairness-and-report-v2-plan.md`, `docs/design/2026-06-26-report-honesty-and-provenance-design.md`, `docs/design/2026-06-26-report-honesty-and-provenance-plan.md`	`version` is updated to `2.5.0`, build and workflow scripts are adjusted, and the design and product docs are rewritten around the deterministic engine, weighted scoring, and provenance-oriented reporting.
Engine foundations and shared registries `plugins/awos/skills/ai-readiness-audit/{generated.ts,generated.test.ts,frameworks.ts,frameworks.test.ts,ci_platforms.ts,ci_platforms.test.ts,languages.ts,languages.test.ts,progress.ts,agent_tools.ts,agent_tools.test.ts}`, `plugins/awos/skills/ai-readiness-audit/metrics/_ast.ts`, `plugins/awos/skills/ai-readiness-audit/detectors/_base.ts`, `plugins/awos/skills/ai-readiness-audit/collectors/README.md`, `plugins/awos/skills/ai-readiness-audit/tests/collector-base.test.ts`	Adds shared generated-path, framework-auth, CI, language, progress, agent-tool, and Tree-sitter helpers plus the base detector and collector contracts used across the engine.
Collectors, CLI wiring, and audit orchestration `plugins/awos/skills/ai-readiness-audit/{collectors/_base.ts,collectors/ci.ts,collectors/git.ts,collectors/docs.ts,collectors/tracker.ts,cli.ts,audit_core.ts}`, `plugins/awos/skills/ai-readiness-audit/tests/{ci-collector.test.ts,cli.test.ts}`, `plugins/awos/skills/ai-readiness-audit/ci_platforms.test.ts`	Adds collector implementations, CLI subcommands, audit-core aggregation, and command/collector smoke coverage.
AI tooling, architecture, documentation, and delivery detectors `plugins/awos/skills/ai-readiness-audit/detectors/{ai_development_tooling.ts,code_architecture.ts,documentation.ts,end_to_end_delivery.ts}`, `plugins/awos/skills/ai-readiness-audit/detectors/{ai_development_tooling_ai04.test.ts,code_architecture_arch06.test.ts,documentation_doc04.test.ts,det-end-to-end-delivery.test.ts}`, `plugins/awos/skills/ai-readiness-audit/dimensions/{ai-development-tooling.md,code-architecture.md,documentation.md,end-to-end-delivery.md}`	Adds the AI tooling, architecture, documentation, and delivery detectors, their tests, and the matching category metadata in the dimension specs.
Prompt integrity, repository security, and application security detectors `plugins/awos/skills/ai-readiness-audit/detectors/{prompt_agent_integrity.ts,security.ts,application_security.ts}`, `plugins/awos/skills/ai-readiness-audit/detectors/{det-prompt-agent-integrity.test.ts,prompt_agent_integrity_local.test.ts,security_sec05.test.ts,application_security_as03.test.ts,application_security_as06.test.ts,det-application-security.test.ts}`, `plugins/awos/skills/ai-readiness-audit/dimensions/{prompt-agent-integrity.md,security.md,application-security.md}`	Adds the prompt integrity, repository security, and application security detectors with matching dimension metadata and coverage.
QA, best-practices, SDD, and supply-chain detectors `plugins/awos/skills/ai-readiness-audit/detectors/{quality_assurance.ts,software_best_practices.ts,spec_driven_development.ts,supply_chain_security.ts}`, `plugins/awos/skills/ai-readiness-audit/detectors/{det-code-architecture.test.ts,det-documentation.test.ts,det-prompt-agent-integrity.test.ts,spec_driven_development_sdd05.test.ts}`, `plugins/awos/skills/ai-readiness-audit/dimensions/{quality-assurance.md,software-best-practices.md,spec-driven-development.md,supply-chain-security.md}`	Adds the QA, software-practice, spec-driven, and supply-chain detectors and updates the related dimension category metadata and tests.
Metrics, rollup, and rendering `plugins/awos/skills/ai-readiness-audit/{metrics/*,render.ts}`, `plugins/awos/skills/ai-readiness-audit/tests/{adp_g13_doc_coverage.test.ts}`, `plugins/awos/skills/ai-readiness-audit/tests/cli.test.ts`	Adds metric helpers, ADP metrics, org rollup, the doc-comment coverage metric test, and deterministic Markdown/HTML rendering.
Validation and engine tests `tests/lint-prompts.test.js`, `.claude-plugin/marketplace.json`, `plugins/awos/skills/ai-readiness-audit/tests/*.test.ts`	Exercises the prompt contracts, version alignment, and detector/collector/metric wiring across the new audit dimensions.

Sequence Diagram(s)

sequenceDiagram
  participant cli_ts
  participant audit_core_ts
  participant collectors_ts
  participant metrics_ts
  participant render_ts
  cli_ts->>audit_core_ts: auditCore(repoPath, outDir, DETECTORS, METRICS, standardsPath)
  audit_core_ts->>collectors_ts: write collected/*.json
  audit_core_ts->>metrics_ts: compute per-dimension JSON
  audit_core_ts->>cli_ts: return AuditCoreSummary
  cli_ts->>render_ts: renderMarkdown(audit.json) or renderHtml(audit.json)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~90+ minutes

Possibly related PRs

provectus/awos#115: Extends the same ai-readiness-audit dimension metadata and category tagging that this PR continues to refine.
provectus/awos#116: Also changes the AWOS audit orchestration and prompt contracts that this PR updates.

Suggested labels

enhancement

Suggested reviewers

kmakarychev-dev
workshur

Poem

I thumped through versions, bright and new,
🐰 With weighted scores and render glue.
I tucked the metrics in my burrow,
Then hopped through tests from dawn till morrow.
The audit garden blooms in code.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title matches the PR’s main theme: weighted scoring, deterministic engine work, and report redesign.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/ai-sdlc-metrics

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

tests/lint-prompts.test.js (1)

1238-1241: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Parse only check headings when building dimension blocks.

Line 1239 splits on every ### heading, but the contract is about ### CODE-NN check blocks. This is brittle if non-check subheadings are added later.

Suggested fix

-    // Split into check blocks by the "### CODE-NN:" headings.
-    const blocks = body.split(/^### /m).slice(1);
+    // Split into check blocks by the "### CODE-NN:" headings.
+    const blocks = body.split(/^###\s+[A-Z]+-\d+:/m).slice(1);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/lint-prompts.test.js` around lines 1238 - 1241, The block parsing in
the lint prompt test is too broad because it splits on every “###” heading
instead of only the check-block headings. Update the logic around body.split and
the subsequent block-processing loop in the test to match only “### CODE-NN”
headings, so extra non-check subheadings are ignored. Keep the rest of the
dimension-block extraction flow the same, but ensure the heading filter is
specific to the contract enforced by this test.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/design/ai-sdlc-exec-deliverable.md`:
- Line 16: The Markdown ASCII diagram fences in the document are unlabeled,
which breaks lint/tooling consistency; update the fenced blocks in the affected
sections to use a language tag such as text. Make this change for each ASCII
block by editing the relevant fenced blocks in the document so the existing
content stays the same but the fences are explicitly labeled, matching the style
used by other Markdown examples.

In `@docs/design/ai-sdlc-measurement-and-scoring-plan.md`:
- Line 40: The implementation contract in the metrics section uses a
machine-local absolute path, which should be removed for portability. Update the
reference in the metrics/ description to point generically to the existing
complexity scanner using a repo-relative or tool-agnostic identifier instead of
/Users/aleksandrmakarov/code/scripts/complexity-scan.py, while keeping the rest
of the metrics contract unchanged.

In
`@plugins/awos/skills/ai-readiness-audit/references/ai-sdlc-metrics-catalog.md`:
- Line 31: The ADP-G7 revert pattern is too narrow because the `^Revert"` match
misses standard revert subjects like `Revert "..."`, which can undercount
failures in the metrics guidance. Update the revert/rollback pattern in the
ADP-G7 entry of the metrics catalog so it matches the common spaced form used by
revert commits, while keeping the existing hotfix and rollback terms intact.

In `@plugins/awos/skills/ai-readiness-audit/references/data-sources.md`:
- Around line 98-101: The global SKIP rule in data-sources.md conflicts with
metric-specific requirements such as MTTR’s need for a real incident source.
Update the wording around the partial source and SKIP rule sections to state
that the generic fallback applies only unless a metric defines stricter
required-source contracts, and explicitly note that metric-specific rules like
MTTR override the default behavior. Keep the guidance aligned with the existing
metrics/ layer language so readers can tell when a metric should still skip
despite partial source availability.

In `@plugins/awos/skills/ai-readiness-audit/scoring.md`:
- Around line 20-22: Add a language tag to each unlabeled fenced code block in
the scoring markdown so it passes MD040. Update the fenced examples around the
dimension_score, coverage_ratio, and audit_total formulas to use a consistent
annotation such as text, keeping the existing content unchanged. Use the
markdown sections containing those formulas as the target for the fix.

In `@tests/lint-prompts.test.js`:
- Line 1118: The lint prompt test is using an overly broad regex that can match
unrelated words like “degraded” or “upgrade.” Update the assertion in the test
around assert.doesNotMatch in tests/lint-prompts.test.js to use a bounded
“grade” pattern that only matches the standalone word, keeping the intent of
dimension-auditor must not emit a grade while avoiding false failures.
- Around line 1008-1030: The current check in lint-prompts.test.js only verifies
that required keys exist somewhere in standards.toml, so malformed [category.*]
tables can still pass. Update the assertion logic around the existing
[category.*] scan to validate each category table block individually, using the
same symbols/loop in the test, and ensure every table contains all required keys
within its own section rather than relying on file-wide matches.

---

Nitpick comments:
In `@tests/lint-prompts.test.js`:
- Around line 1238-1241: The block parsing in the lint prompt test is too broad
because it splits on every “###” heading instead of only the check-block
headings. Update the logic around body.split and the subsequent block-processing
loop in the test to match only “### CODE-NN” headings, so extra non-check
subheadings are ignored. Keep the rest of the dimension-block extraction flow
the same, but ensure the heading filter is specific to the contract enforced by
this test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 63f87513-b53d-4915-871a-9d0780bf7a55

📥 Commits

Reviewing files that changed from the base of the PR and between 7df798b and a453584.

📒 Files selected for processing (25)

.claude-plugin/marketplace.json
docs/design/ai-sdlc-exec-deliverable.md
docs/design/ai-sdlc-measurement-and-scoring-plan.md
plugins/awos/.claude-plugin/plugin.json
plugins/awos/agents/dimension-auditor.md
plugins/awos/skills/ai-readiness-audit/SKILL.md
plugins/awos/skills/ai-readiness-audit/dimensions/ai-development-tooling.md
plugins/awos/skills/ai-readiness-audit/dimensions/code-architecture.md
plugins/awos/skills/ai-readiness-audit/dimensions/documentation.md
plugins/awos/skills/ai-readiness-audit/dimensions/end-to-end-delivery.md
plugins/awos/skills/ai-readiness-audit/dimensions/project-topology.md
plugins/awos/skills/ai-readiness-audit/dimensions/prompt-agent-integrity.md
plugins/awos/skills/ai-readiness-audit/dimensions/quality-assurance.md
plugins/awos/skills/ai-readiness-audit/dimensions/security.md
plugins/awos/skills/ai-readiness-audit/dimensions/software-best-practices.md
plugins/awos/skills/ai-readiness-audit/dimensions/spec-driven-development.md
plugins/awos/skills/ai-readiness-audit/dimensions/supply-chain-security.md
plugins/awos/skills/ai-readiness-audit/output-format.md
plugins/awos/skills/ai-readiness-audit/references/ai-sdlc-metrics-catalog.md
plugins/awos/skills/ai-readiness-audit/references/data-sources.md
plugins/awos/skills/ai-readiness-audit/references/standards.md
plugins/awos/skills/ai-readiness-audit/references/standards.toml
plugins/awos/skills/ai-readiness-audit/report-template.md
plugins/awos/skills/ai-readiness-audit/scoring.md
tests/lint-prompts.test.js

… bundle + CI job) - Install devDeps: typescript, tsx, esbuild, @types/node; runtime: smol-toml - Add tsconfig.json (NodeNext, strict, allowImportingTsExtensions, noEmit) - Add tests/helpers.ts: loadStandards() via smol-toml, writeCollected() fixture helper - Add tests/smoke.test.ts: asserts meta.monthly_bucket_days===30, max_lookback_days===730 - Add scripts/build-engine.mjs: esbuild driver bundling collectors/detectors/metrics entrypoints - Create collectors/, detectors/, metrics/, dist/ scaffold dirs with .gitkeep - Add test:engine and build:engine scripts; fold test:engine into npm test - Add node-engine CI job to quality-check.yml - Add TS scaffold presence lint check to tests/lint-prompts.test.js (56 tests, all pass)

…mplify build-engine - Add `npm ci` step to the `test` job in quality-check.yml so tsx is available when `npm test` chains into `test:engine` (fixes MODULE_NOT_FOUND on CI). - Remove `allowScripts` from package.json (Bun-only convention; npm ignores it). - Replace dynamic `import('node:fs')` in build-engine.mjs with a direct `writeFileSync` call, adding it to the existing static import.

…categories + lint + schema test Classifies every [category.*] table in standards.toml with a method field: computed (numeric result), detected (deterministic boolean from regex/glob/AST/config), or judgment (semantic sampling required). The 7 judgment categories additionally carry rubric and evidence_required. Both the JS regex lint (tests/lint-prompts.test.js) and a new TypeScript engine schema test (standards-schema.test.ts) guard the vocabulary and the judgment-requires-rubric contract. standards.md documents the Method section. Prettier clean.

…erminism over LLM sampling)

Adds collectors/git.ts (always-available Tier-G collector) that shells to git via execFileSync to gather default_branch, monthly_buckets, merge_records, revert_merges, total_merges, ai_marked_commits, total_commits, tooling_paths, and numstat_totals. Hermetic node:test suite builds a throwaway git repo with pinned GIT_AUTHOR_DATE / GIT_COMMITTER_DATE for fully deterministic assertions. Also adds plugins/awos/skills/ai-readiness-audit/dist/ to .prettierignore so generated bundles are not flagged by prettier.

… (drop Date.now), remove dead code - buildMonthlyBuckets: window end is now the latest committer date from git (git log --all --format=%cI --max-count=1); since = latestCommitDate − lookback_days. Date.now() is gone — buckets are a pure function of git history + period params. - Removed dead rangeOut call in getMergeRecords (fired nonsensical sha^2..sha^2 range, result was discarded immediately). - Removed unused countLines helper.

…ause syntax fix - Add detectors/software_best_practices.ts with three detectors: detectExceptClauseDefect (2706): FAILs on Python-2 `except A, B:` syntax detectErrorHandling (2704): heuristic over catch/except blocks — FAIL/WARN/PASS detectLockfiles (2705): PASS if any recognised lockfile present - Add DETECTORS map: { 2704, 2705, 2706 } → detect functions - Add [category.sbp_except_clause_syntax] (code 2706, method=detected) to standards.toml - Update SBP-06 in software-best-practices.md: Category 2704, 2706 - 14 hermetic unit tests; all gates green (39 engine, 56 lint-prompts, build:engine, prettier)

…llect/detect/metric) Replace per-module multi-entrypoint bundling with a single cli.ts that esbuild inlines into dist/cli.js. The dispatcher handles `collect <source> <repoPath>`, `detect <code> <repoPath>`, and `metric <id>` (stubbed, exits non-zero until metric modules land). build-engine.mjs cleans dist/ before building so all stale flat + nested files are removed. Adds hermetic smoke tests in cli.test.ts.

A check whose precondition is absent used to award a vacuous PASS, letting an empty or low-maturity repo read as compliant (dossier 02-metric-range-and-interference.md §A3). Absence now emits SKIP, which excludes the check from the coverage denominator: - ARCH-02: no source files, or no files under recognised layer dirs - SDD-03: no architecture document to match against - SDD-05/SDD-06: no spec directories - SEC-05: no sensitive file types relevant to the stack - SBP-08: no Python source (belt-and-braces with topology.has_python) Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Restructures the dimension taxonomy so each dimension is a coherent capability area and the report reads industry-standard engineering first, AI-frontier last (scores change; audit is unreleased): - New Delivery Flow dimension: the DORA family moves out of the ai-sdlc-adoption grab-bag — DF-01 deploy frequency, DF-02 lead time, DF-03 PR cycle time, DF-04 change-failure rate, DF-05 review rework, DF-06 rework rate, DF-07 MTTR. - New Descriptors dimension (unscored, rendered last): DESC-01 contributors, DESC-02 churn, DESC-03 complexity, DESC-04 scale, DESC-05 dependency counts. All weight 0 with a neutral INFO badge — size/activity describe a repo, they don't grade it — and the headline Merges/LOC throughput echo moves onto this page. Fixes the cannot-reach-zero saturators (dossier §A2/C2). - Security dimension dissolved: SEC-01/03/05 → Application Security (AS-12 env gitignored, AS-13 env template, AS-14 sensitive-file ignore coverage); SEC-02 → AI Security (AIS-07 agent guardrail hooks); SEC-04 dropped — it duplicated AS-05 (no hardcoded secrets). - End-to-End Delivery dimension dissolved: E2E-01/04 → Software Best Practices (SBP-09 vertical delivery, SBP-10 no orphaned artifacts), E2E-03 → Documentation (DOC-07 spec traceability), E2E-05 → Code Architecture (ARCH-07 cross-layer tooling). Mis-seeded source fields corrected (SBP-09 is a git signal, not a DORA citation). - prompt-agent-integrity renamed to ai-security (PAI-NN → AIS-NN). - Dimension order is data: standards.toml [meta].dimension_order; audit-core stamps order/title/description onto every artifact, aggregate preserves it, and the renderer shows each dimension's description as a hover tooltip on the summary row and dim page. - standards.toml physically regrouped by the new order. Weight spread across scored dimensions is now 27-86 (was 16-139). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Squash/rebase-merging a PR leaves no merge commit, so every metric built on `git log --merges` silently read 0 on healthy repos, and per-author merge counts credited only whoever clicked the merge button (dossier 03-squash-merge-blind-spot.md). - The git collector now counts squash-merged PRs as merge events: first-parent non-merge trunk commits carrying a forge PR ref — GitHub "Title (#123)", Azure DevOps "Merged PR 123: …", Bitbucket "(pull request #12)", GitLab "See merge request …!45" in the body — attributed to the commit author, which for a squash merge IS the PR author. window_stats gains merge_commits / squash_merges / merge_strategy (merge-commit | squash | mixed | unknown). - Deploy frequency, change-failure rate, and rework rate now measure squash repos correctly (their revert/fix keyword filters cover the squashed subjects too). - Lead time, PR cycle time, and review rework are merge-record proxies that cannot exist without real merge commits: on a squash-strategy repo they SKIP with a connector-pointing reason instead of reporting a confident number from unrepresentative residue. The MTTR git proxy stays included (per its contract) but degrades confidence with an explanatory note. - activeContributors: merge-share now includes squash events, restoring the safety valve on squash repos; when a repo has no merge events at all, commit-share replaces merge-share so the rule can't degenerate to LOC-share-only (the '1 active of 9' collapse). Validated on the dossier's evidence repo: 19 merges credited to one maintainer → 114 merge events across the 4 real PR authors, strategy=squash. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Report fixes from the barley validation run: - Headline VALUES now carry tooltips: the underlying check's evidence (how the number was derived) with the check id + status as meta; an absent value ('—') explains itself with the check's skip reason instead of standing bare. - Metric-routed SKIPs surface the metric's own reliability note (e.g. 'squash-merge workflow: no branch merge records…') instead of the generic 'required data was not available'. - Active Contributors tooltip states the real share-based rule (active unless BOTH merge-share and LOC-share fall below the 5% threshold), not an invented ≥2-commits heuristic. - Change-failure definition describes what is actually computed (window keyword proxy), dropping the unresolved 'within N days' placeholder; dependency-count definition drops its SKIP boilerplate. - Spec coverage tooltip is plain language (no 'check SDD-04' jargon). - SDD-04 denominator is now MERGED feature work (first-parent merge commits + squash-merged PRs, 90d window, first-parent diffs) — on repos whose CI deletes branches after merge, live refs undercounted badly (barley: 11 surviving branches vs ~280 merged PRs). Live-branch evaluation remains only as a fallback for merge-less workflows. Orchestration cost (profiled: Step 6 hand-patching was ~35 of 47 serial model turns): - New engine verb 'patch-judgment <outDir> <patches.json|->' applies ALL judgment verdicts in one call — validates ids, refuses non-judgment checks, clamps scores, derives weight_awarded, and re-aggregates itself. SKILL.md and the repo-auditor agent now mandate it (no hand-edited dimension JSONs, no separate aggregate). - SKILL.md hardens the engine-call budget (one enrich, one patch-judgment, one render --format both) and forbids interleaving shell processing between Jira pages. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

'sonnet' resolves to the best available Sonnet at run time, so the harness doesn't silently pin audits to an outdated version id. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…eup polling A measured run spent most of its 19 minutes in ScheduleWakeup wait loops polling background Jira/Confluence fetch agents instead of just making the calls. SKILL.md and the repo-auditor agent now state that connector fetches are inline parallel tool calls, the judgment subagent is a single foreground Agent call, and ScheduleWakeup / background tasks / fallback polling are never used in this skill. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

The old definition — fraction of repos with ≥1 reachable data-source collector — always read 100% because git is always reachable, so the portfolio card carried no information. It is now the mean of the per-repo coverage ratios (awarded ÷ applicable weight): how much of the current standard the reachable sources could actually score. On the sample-org run this reads 69% instead of a vacuous 100%. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…esult segments - SKILL.md org mode: the wait-for-subagents barrier is satisfied by the foreground Agent calls returning — never background tasks polled via ScheduleWakeup or filesystem checks. A measured org run doubled its wall time AND cost that way (9 resume segments, each reloading full context uncached). - Harness: a session split into resume segments emits one stream-json result event per segment, and reading only the last one reported a 94-turn / 18m47s hops run as '9 turns / 78s'. stream_run now aggregates across ALL result events (turns/durations summed, cost and usage from the cumulative maximum, is_error OR-ed), measures true start-to-finish wall_ms itself, and records result_segments so a split session is visible in run-meta.json. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Mindset shift: Coverage (share of the current standard in place) is what a reader should judge a project by — Points is the raw material. - The report headline is now '##.#% Coverage' with Points secondary, in both HTML and Markdown; the tooltip cites the standard's provenance: 'Average software project score among all applicable metrics by industry standards on <date>', where <date> is the max last_verified across standards.toml categories, stamped into audit.json as standards_meta by audit-core. - Dimensions and Repositories tables swap the Coverage/Points columns (coverage leads); per-check Points cells show the ratio too: '1.3/8 (16.3%)'; weight-0 descriptor rows show '—'. - Active Contributors tooltip interpolates the real threshold from standards.toml ({threshold} placeholder resolved from standards_meta) instead of hardcoding '5% by default' prose. - Org report parity: per-repo reports under per-repo/ get an automatic '← Back to org report' link (detected from the render out-dir); the org rollup passes merged source_windows + standards_meta through so the org header shows the measurement window; Connections & Sources uses the same Connected/Missed template as a per-repo report with (n/N) repo counts ('CI runs (3/8)'), and Tech Stack lists org items the same way; Repositories column headers all carry tooltips. - SKILL.md: org audit.json carries source_windows/standards_meta through, and 'project' is the portfolio name only — never the inlined repo list (the Repositories table already lists them). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…ime fix Org portfolio cards — one weighting, clear names, coverage first: - All three cards now use the SAME weighting: by active contributors per repo (a 40-person repo moves the portfolio average more than a 2-person one), falling back to equal weights when counts are missing; every card's description states which was used. - Reordered and renamed: 1) 'Standards coverage' (was 'Measurement coverage') — the contributor-weighted mean of the per-repo coverage headlines, so the org card and the per-repo headline are explicitly the same concept; 2) 'Capability score' (now contributor-weighted); 3) 'Repos with AI tooling' with the plain 'X of Y repositories' count in its tooltip. The per-repo headline is renamed to '##.#% Standards coverage' too (table columns keep the short 'Coverage'). Tooltip quality — written for a non-technical reader: - Org headline matrix labels ('Merges / active contributor', 'LOC / active contributor') had no dictionary entry and echoed the label as its own tooltip; they and the Repositories column headers now carry 2-3 plain sentences each. - Every metric-label tooltip ends with 'Standard last verified <date>' — per-check dates from the category's last_verified (now carried on CheckRecord), the overall standards date elsewhere. - Every Coverage tooltip cites 'industry standards on <date>', including the org card and both report headlines. Misleading skip reasons (QA-session finding): buildSkipReason let the 'connect a <source>' template win over applicability because the pseudo-source 'audit' counted as connectable — a TypeScript library showed ~16 checks demanding 'connect a audit source' when the truth was 'this repo has no web app'. Applicability now wins, only real connector sources (tracker/docs/ci/incident) prompt for a connector, and the article agrees ('an incident source'). Org-mode cycle time (investigated in a separate session): the repo-auditor agent's 'tools:' frontmatter restricted it to file tools, so per-repo audits structurally could not fetch Jira/Confluence via MCP — every org repo showed 'no tracker connector provided' even when the orchestrator's own probe reached Jira. The restriction is removed (the agent inherits the full toolset), the connector shape gains an optional in_progress_at (Jira: expand=changelog) so the cycle-time headline can actually compute median in-progress→done, and SKILL.md explains the computation. MTTR behavior confirmed correct (incident connector only). SKILL authoring hygiene: org reach 'ai_tooling' summarises with counts only (never repo-name enumeration); recommendations must cite the check they actually remediate; every authored ratio renders as a percentage with one decimal, never a raw float. Harness: claude -p sessions see the operator's user-scope MCP servers even with --setting-sources project, so a test audit could pull live Jira data. The harness now passes --strict-mcp-config by default (--allow-user-mcp opts out). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…s --target The harness previously died on a non-git --target, so org-mode audits could only be launched by hand — bypassing the blank-slate phase prep and engine-compliance guard (which is how a same-day rerun silently re-presented a previous engine's artifacts). Org folders are now first-class: children with .git are enumerated, and provenance records the repo count instead of a target commit. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…confidently A broken measurement must read as "couldn't measure", never as a measured verdict. audit_core: a metric that throws or an unknown metric= id now routes its categories to SKIP with a metric-error note and confidence 0 (was: silent FAIL at confidence 1 with empty evidence); aggregate() treats any finite score as explicit so a patched score of 0 is no longer re-inflated to the status default; patchJudgments rejects statuses outside PASS/WARN/FAIL/SKIP; unparseable dimension JSON and unreadable prior audit.json are logged instead of silently dropped; corrupted collector artifacts report "unreadable", not "not found"; coverage is null (not 0) when no weight is applicable; CI platform naming defers to ciPlatformName (single source of truth); reliability confidence unified on HIGH/MED/LOW. git collector: collect() probes `git rev-parse --git-dir` first and emits available:false with the real error on broken environments (was: available:true with all-zero stats scored at full confidence); run() classifies expected-empty exit 1 (silent) vs fatal 128/ENOENT/EACCES/ENOBUFS (logged + artifact flipped unavailable); commit-less repos no longer spam breadcrumbs. AI-attribution grep runs under --extended-regexp so the (Windsurf|Cascade) alternation matches. ci collector: connector-only path no longer claims a CI config was detected. languages: evidence labels derive from the extensions actually matched. New Bitbucket "(pull request #N)" squash fixture. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Empirically verified false signals, each now pinned by a regression test: .env pattern no longer matches process.env/os.environ (AIS-07 false PASS); gitignore check accepts .env* / .env*.local (Next.js/CRA defaults); dead \b-before-@ regexes fixed so @app.route/@RestController/express()/FastAPI() register (DOC-03 wrong SKIP, validation/rate-limit detectors); catch-opener matches K&R `} catch (`; PEP 508 env markers no longer read as the literal "$1" (unpinned); RAW_SQL requires SQL continuations so UI copy like "Delete item" stops failing ARCH-04; mutation routes require a router-ish receiver (cache.delete/axios.post excluded); RSpec/Jasmine spec/ dirs no longer earn SDD-04 spec-driven credit; prose "go"/"node" no longer count as tech mentions (SDD-03); import-graph layering WARNs on 1 violation and FAILs on 2+ as documented; single-token filenames count as compatible with every lowercase convention; single-service repos SKIP the per-service-README check; detached- HEAD pseudo-branch entries filtered; schema/namespace hosts exempt from the TLS origin count; md5/sha1 flag requires password context, not any "hash"; pandas/numpy alone no longer classify a repo as an ML project. iterFiles: 512 MB maxBuffer (ENOBUFS on big monorepos silently flipped topology flags off), path-containing globs now work via find -path, and the promised JS-walk fallback exists. makeResult clamps score/confidence to [0,1]. Message-less assert.ok calls in the PAI/QA test files now name their contracts. Unused RANGED_RX deleted. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…e/MTTR Three metrics hardcoded score 1.0 and awarded full weight regardless of the measured value: review rework (DF-05) now bands avg commits/PR, issue throughput (ADP-18) scores 0 at zero resolved and bands resolved/week, pipeline duration (ADP-16) SKIPs on null and bands the duration. CI pass rate guards empty runs (0/0 NaN poisoned audit_total). MTTR keeps the git proxy's minimal reliability even when an incident source exists — the value never came from incident data. Sub-task split averages over all parent-eligible tickets so the best-case anchor is reachable and one epic can't dominate. Doc coverage loses its 0.8/0.6 award cliff (continuous score modulates the weight). Tooling depth path matching is boundary-safe (.awos-legacy no longer matches .awos). Shared readArtifact() in metrics/_base.ts: every metric now degrades to SKIP with the parse error in its note when a collector artifact is corrupt, instead of throwing into a confident FAIL. AST init failures carry the real error into the SKIP note. categories_awarded typed number[]. Org report: rollup carries each repo's cycle_time and mttr display values into the portfolio per_repo rows (the Repositories table rendered these columns 100% blank because the fields were never populated). PR cycle time prefers tracker tickets with changelog-derived in_progress_at -> resolved_at intervals and falls back to the git branch-lifetime proxy — which also rescues squash-merge repos; connector-shapes.md documents the field. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…isibility Markdown delivery table no longer prints the literal "undefined" for a valueless ungated row; the evidence tooltip drops its double HTML-escape (tip() already escapes); a new mdCell() helper escapes pipes/newlines in every untrusted Markdown table cell so an LLM-authored title can't corrupt the table; the org Repositories table renders the newly-carried cycle_time/mttr values; Merges/LOC tooltips say "per week" to match the displayed rate; the header schema comment documents every top-level field the orchestrator must preserve and points at the design doc that exists. PENDING_JUDGMENT is counted and surfaced explicitly in both report formats (amber chip in HTML, suffix in MD) instead of masquerading as SKIP — an unpatched headless run is now visibly unfinished. Unknown statuses warn on stderr. cli.ts: local AuditJson duplicate renamed to a parse-boundary ParsedAudit; usage strings list all 11 verbs. New adversarial escaping suite (hostile strings through evidence, insights, recommendations, repo names — pins single-escaping and intact MD tables) and patch-judgment/render CLI dispatch tests (stdin "-", invalid JSON, bad --format, missing --out-dir). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Nothing typechecked the engine (tsx and esbuild both strip types), and the JSON-artifact shapes were declared independently in audit_core, render, cli, and org_rollup — with live drift: render's status union lacked INFO and PENDING_JUDGMENT, cli redeclared AuditJson, render read undeclared fields through index-signature casts. `npm run typecheck` (tsc --noEmit) now runs in CI next to the dist gate, and a new artifact_types.ts is the single source for CheckStatus, reliability vocabularies, Check/DimensionArtifact/AuditJson/ PerRepoSummary/OrgConnections (type-only imports, zero bundle cost). The typecheck immediately caught a real bug: `cli.js collect tracker <repo>` passed the git tunables object as the connector payload, fabricating an available:true tracker artifact with zero tickets; the collector registry is now typed so only git receives its options. CI dist gate stages the directory first (git add -A + diff --cached) so an untracked build output — a newly bundled grammar — can no longer slip past the sync check. dist/ rebuilt from the fixed sources. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

plugins/awos/README.md described the removed architecture — per-dimension context windows, a PASS/WARN/FAIL deduction table, A-F grades, an 8-dimension list; rewritten from the engine model (13 dimensions, additive weighted scoring, JSON artifacts, both reports always rendered). CLAUDE.md corrected: patch-judgment added to the verb list, dist gate described as currently non-blocking, injection described truthfully (present but does not fire in plugin skills — Step 5 is the mechanism), four test layers, no more "3-tab"/per-dimension-phase/history annotations. ai-sdlc-adoption.md check ids now match what the engine emits (ADP-01..06, ADP-14..25 — the doc used a private ADP-G*/C*/I* scheme a report reader could never find). SKILL.md: contiguous step numbering, "no per-dimension fan-out" qualified (org repo-auditors and the judgment pass are sanctioned), one bold rule kept, org headline figures copied from the rollup so the header can't disagree with the portfolio cards. Judgment arrays are written inside the audit output dir — repo-auditor.md previously had every concurrent org auditor share /tmp/judgments.json, letting one repo's verdicts patch another. Dissolved end-to-end-delivery/security references replaced; detectors/ and metrics/ READMEs match the real registry and result shapes; ORG-02/ORG-03 standards definitions match the contributor-weighted rollup; harness README documents the per-repo/<repo>/ layout, org-folder --target, --model and --allow-user-mcp. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

A claude run that exited non-zero or errored printed "run complete" and exited 0 while the fallback render polished its partial artifacts; it now prints an INCOMPLETE banner, records partial:true and judgments_patched in run-meta, and exits 1. Engine compliance counts actual audit-core tool_use events (plus the executed-injection marker) instead of substring hits the prompt text itself satisfies. Marketplace repoint failures die with the captured stderr and restore failures warn loudly; restore writes each config file's own original value instead of km_source into both. Org repo count globs per-repo/*/audit.json (was per-repo/*.json — always 0). standards-linkcheck classifies a deep link that redirects to the site root as DEAD (the most common way doc links die previously passed). build-engine exits 1 when the core wasm or any bundled grammar is missing instead of warning and shipping a broken bundle. lint-prompts assertions that could not fail now pin real content: the cli render invocation, the no-individual- attribution privacy sentence, additive weighted points, org-portfolio.json. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

The pr-review/pr-comments-address skills write their drafts and plans to review/; ignore the folder so a local review trail can't ride into a PR. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

… honest gated notes, org-parent guard Four org-report defects from the provectus-barhopping runs, root-caused against the live Atlassian MCP: Gated rows lied about the cause: "— (needs ticketing connector)" rendered even when Jira was connected but tickets lacked status-transition history. DeliveryMetric gains an optional note rendered as "— (<note>)"; SKILL.md instructs authoring the real state (history not fetched / zero resolved in window / partial fetch) and forbids the needs-connector default when a tracker was reachable. Cycle time and MTTR now carry reader-grade tooltips on the headline rows and the org Repositories column headers, explaining exactly what data the metric needs and why it can be gated with a tracker connected. The g5 squash-repo SKIP reason names the changelog gap precisely. The Jira MCP provides everything the metrics need — the prompts just never asked for it: searchJiraIssuesUsingJql caps at 100/page (the "exactly 100 tickets" and the run-to-run count drift) and paginates via nextPageToken; changelogs come only from per-issue getJiraIssue(expand: "changelog"). connector-shapes.md now carries the concrete recipe: paginate to completion with stable ordering + computeIssueCount, then a parallel changelog pass over the ~50 most recently resolved tickets, mapping in_progress_at to the first transition into an In-Progress-CATEGORY status (verified: tickets go Backlog -> To Do -> In Review -> Done without a literal "In Progress"). Tracker artifacts must record a fetch_meta completeness block; every tracker-consuming metric surfaces "partial tracker fetch: X of Y tickets" in its reliability note. repo-auditor.md states both requirements inline so org runs stop shipping single-page fetches. judgments_patched=false root cause: the pre-scope audit-core pass audited the non-git org PARENT folder, leaving an unpatchable stray audit at the output root. audit-core now detects an org parent (not a work tree, >=2 immediate git children), prints why, and exits 0 without writing artifacts. Org headline reach.contributors is pinned to the single-repo shape "<active> active (of <total> in window, 90d)" (sums across repos) instead of the improvised "39 active across 8 repos". Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

What the skill does and the engine/orchestrator split, standards.toml as the scoring model in data, headless testing via the audit-test-harness, and the maintenance paths (standards-refresh skill, adding dimensions/metrics, prompt edits, versioning). Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Root scripts/ is in the installer copy table (scripts/ -> .awos/scripts/, overwritten on update), so build-engine.mjs and the standards-linkcheck pair were being copied into every user's project despite being maintainer-only tooling — users run the prebuilt dist/ and never lint standards.toml. Moved all three to tools/ai-readiness-audit/ (dev-only, never copied), fixed their internal repo-root/default-path resolution for the new depth, and wired the linkcheck tests into npm test as test:audit-tools (previously they matched no test glob and only ran by hand). create-spec-directory.sh stays — it is genuine framework product. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

run_audit_test.py and compare_audit_runs.py become run_audit_test.ts / compare_audit_runs.ts (plus harness_lib.ts for the pure helpers), run via the repo's existing tsx toolchain — same language as the engine, shared shapes, one less runtime prerequisite. All CLI flags, the marketplace repoint/finally-restore, phase seeding/stashing, the engine-compliance guard with retry/salvage, partial-run detection, token accounting across result segments, and the archive/run-meta layout are ported unchanged so old and new runs stay comparable. One deliberate fix over a literal port: die() throws instead of process.exit, which in Node would skip the finally block and leave the marketplace pointed at a worktree. New: a live progress log ([MmSSs]-stamped Bash/Agent tool events, the skill's progress emissions, a 60s heartbeat; --quiet to silence), and a final summary block with wall time (NmSSs), token counts, cost in dollars, turns, compliance/judgments verdicts, and the absolute archived report.html path(s) (org + per-repo). run-meta.json gains report_html. Invoke via npm run audit:test / audit:compare; pure helpers covered by tools/audit-test-harness/ harness.test.ts wired in as test:harness. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

…anaged artifacts The headless orchestrator repeatedly skipped the deterministic audit-core pass and hand-computed the audit (barley 2026-07-03: 3 attempts, ~45 min, ~3x cost), citing SKILL.md's dead load-time pre-run narrative to justify it. Fix the prompt at the root and make the skip impossible to complete: - SKILL.md: delete the never-executing load-time !`...` injection and every "run audit-core only if audit.json is missing" conditional; Step 5 is now the unconditional first scoring action (pre-existing audit.json is stale output to overwrite), with the prohibition stated at the decision point. Frontmatter disallows Edit/NotebookEdit/ScheduleWakeup while the skill is active (verified enforced for plugin skills on Claude Code 2.1.199). - Engine provenance: audit-core stamps audit.json and every dimension JSON with engine.generated_by; patch-judgment, patch-report, and render refuse an unstamped single-repo audit, and rollup skips unstamped per-repo audits — a hand-assembled audit cannot become a report. - Artifacts are engine-managed end to end: new report-context verb emits the flattened authoring context (check values/hints, git window stats, tracker fetch meta) and new patch-report verb merges the authored headline/ insights/recommendations into audit.json and emits recommendations.md from the same array; the audit-core/enrich summary lists pending_judgment_checks. The orchestrator authors only judgments.json and report-blocks.json — no inline-script inspection or direct artifact edits remain in the flow. - Tests: provenance regression suite, patch-report/report-context CLI tests, lint contracts pinning the no-pre-run/stale/provenance/disallowed-tools wording; dist/ rebuilt. Verified: compliance smoke 3/3 PASS (audit-core called, provenance intact, judgments patched, reports rendered, zero hand-compute/hand-writes/fan-out). Claude-Session: https://claude.ai/code/session_019HGet8jGBoU9gT5wrYZ2Fp

…readiness-audit/qa - Move the QA harness from tools/audit-test-harness/ to tools/ai-readiness-audit/qa/ so one folder holds both the engine build tooling and the QA tooling for the one audit command. - New compliance_smoke.ts (npm run audit:smoke): N headless claude -p runs against a tiny generated fixture repo, verdict per run from hard signals only — audit-core invoked, provenance stamp present, judgments patched, reports rendered (not hand-written), no python/node inline compute, no scoring-JSON hand writes (judgments.json/report-blocks.json exempt), no per-dimension fan-out, no stall-on-question. Fail-fast by default: stop at the first failing run instead of paying for the same failure again (--keep-going for deliberate rate measurement). - Full harness: engine-skip retries now append a corrective system prompt (a bare relaunch re-confabulates the same skip from leftover artifacts); the spoofable injected_audit_core compliance signal is removed — only an actual audit-core Bash invocation counts. - Marketplace repoint helpers extracted to harness_lib.ts (shared by the harness and the smoke tool); unit tests for the new transcript signals. Claude-Session: https://claude.ai/code/session_019HGet8jGBoU9gT5wrYZ2Fp

…urce-probe transparency Root cause of the missing Cycle time: harness isolation (--strict-mcp-config) stripped ALL MCP servers, and a separate 2026-07-02 run fetched 994 Jira tickets without changelogs yet still rendered the default "needs ticketing connector" next to "Connected: Jira via Atlassian MCP". Three fixes, one principle — the audit assesses the project, not the auditor's environment: - Project-scope MCP discovery (harness): the target's own declared servers (.mcp.json / mcp.json / .vscode/mcp.json / .cursor/mcp.json, org folder + every repo subdir, VS Code {servers} shape normalized, collisions suffixed) are merged and passed back explicitly via --mcp-config, which --strict-mcp-config honors. User-scope servers stay excluded by design. - Engine-derived gated rows: audit-core/enrich/aggregate compute audit.derived_delivery (cycle-time median from tracker tickets' in_progress_at→resolved_at, plus the honest gated note like "Jira connected — per-ticket status history not fetched") and the renderer appends those rows, ignoring authored ones — the headline can never contradict the Connections & Sources section again. - Source-probe transparency: a new source_probes report block (patch-report) records what was searched per unreachable source (mcp configs, CLIs, auth state) and renders into "Missed / limited", replacing the bare "supply a connector" hint. - CLI channels (skill): acli/gh/glab are sanctioned measurement channels — gh run list fills collected/ci.json (barley now scores real pipeline metrics), code-host issues can serve as a minimal tracker, and the acli Jira recipe derives the project key from commit-message ticket prefixes; recipes in connector-shapes.md → "CLI channels". - Harness robustness: marketplace repoint snapshots/writes the whole source object (a github-shaped entry mutated only via .path produced a "corrupted installLocation" rejection); smoke exempts sanctioned collected/*.json connector writes. Verified on barley: CI connected via gh (200 runs), tracker/docs missed with full probe trails in the report, derived_delivery consistent with sources, engine-compliant run end to end. Claude-Session: https://claude.ai/code/session_019HGet8jGBoU9gT5wrYZ2Fp

…harness Net -835 lines with zero behavior change (1288 tests green). Applies the findings of the four-angle quality review (reuse, simplification, efficiency, altitude): Shared infrastructure - metrics/_base: skipMetric() replaces ~45 copy-pasted SKIP stanzas; makeMetricResult tail folded into an options object; squash-merge circuit-breaker note/reliability shared; readArtifact memoized (mtime+size) so git/tracker JSON parses once per pass, not 11x/6x; evaluateAppliesWhen() is the single applies_when interpreter. - metrics/_score: shared median/mean/round1; clamp01 reused everywhere. - detectors/_base: per-(repo, ignore-set) cached file listing replaces ~200-300 find spawns per pass; readTextSafe() replaces ~50 hand-rolled try/readFileSync blocks; hasMatch() gives boolean greps an early exit; DetectorResult.status typed as the PASS/WARN/FAIL/SKIP union. audit_core / cli - sources/source_windows/topology derive from one parsed-artifact map shared by auditCore and aggregate (the two copies had already drifted); patch verbs (aggregate, patch-judgment, patch-report, report-context) split into audit_patch.ts; dimensionFiles() iterator shared. - Registries move to detectors/index.ts and metrics/index.ts; cli.ts is dispatch-only (fail()/readJsonArg()/standardsTomlPath() helpers, merged audit-core/enrich cases, static fs imports). - metric verb derives its collectors from standards.toml sources instead of hardcoded id prefixes; org rollup reader moves to metrics/rollup_input.ts with the delivery check-id table derived from DELIVERY_SPECS and AI-tooling codes derived from standards. check_id single source of truth - Every standards.toml category now carries check_id (62 injected from the dimension .md headings); runtime md parsing (parseCheckIds) is deleted; Layer-1 lint enforces presence and md<->toml agreement. Perf - enrich reuses repo-derived checks (detectors, AST metrics) from the per-dimension artifacts and re-scores only connector-affected categories: an enrich pass drops from a full re-audit to ~50ms on a small repo. - git collector: latestCommitDate computed once per collect (was 3 spawns); one all-history squash-merge scan folded in memory for both the unbounded and windowed consumers (was 2 full log passes). - topology: duplicate flag expressions computed once; boolean code greps early-exit. render - renderHtml's ~940-line closure split into top-level section functions; md/html micro-duplications share helpers; canonical COLLECTOR_SOURCES lives in artifact_types.ts. Tests / QA harness - Shared fixture factories in tests/helpers.ts (gitRaw, trackerArtifact, makeCheck/makeDim/makeAudit, makeCheckRecord, tmpDir, gitAs); the ~59 repeated git-raw literals collapse; the two real-repo auditCore shape tests share one run. - QA harness: single stream-json transcript walker, shared claude-spawn wrapper, one-spawn salvage render (--format both), performRun() replaces 8 mutable outer variables, dead exports dropped. CLAUDE.md updated: adding a dimension is four touch points (md, toml, detector module, registry index), and enrich's reuse semantics are documented. Claude-Session: https://claude.ai/code/session_01Ld2EFkQ3DuoFGfvXXqLKZF

…r, short gated cycle-time note Findings from the 2026-07-03 provectus-barhopping org QA run: - Org report rendered a 96-row Dimensions table (12 dims x 8 repos). Root cause was SKILL.md itself: the org-assembly step instructed the model to include "dimensions (aggregated dimension data from all per-repo audits)" in org-portfolio.json, and the renderer rendered whatever it was given. SKILL.md now forbids the key, AuditJson.dimensions is optional (absent on org portfolio JSON), and both renderers ignore top-level dimensions in org mode - an injected concatenation can no longer become a report table. - Same run audited repos 2-3x: three repos got two repo-auditor subagents each, and the orchestrator additionally ran audit-core/enrich for five repos in its own context. SKILL.md org branch now states each repo is audited exactly once by exactly one subagent, the orchestrator never runs the engine itself in org mode, and re-dispatch is allowed only for a repo whose audit.json is missing after its subagent returned. - Per-repo HTML back-link now targets ../../report.html#repos (the org Repositories heading carries id="repos"), returning the reader to the table they navigated from instead of the org report top. - A gated tracker headline row with a connected-but-unmeasurable tracker now shows the short "- (no tickets data)" placeholder; the full explanation moved to the value tooltip (HTML) and the Connections & Sources tracker line (both formats). Tests: org-mode no-Dimensions contract (injected dims ignored; renders without the key), #repos anchor + per-repo back-link, short-placeholder + full-note placement, hostile-escaping fixture split into single-repo and org variants. dist/ rebuilt. Claude-Session: https://claude.ai/code/session_01Ld2EFkQ3DuoFGfvXXqLKZF

…pecheck The Missed/limited probe-log fixture omitted history_available_days, which SourceSummary requires; local runs passed because tsx does not typecheck, but CI's tsc --noEmit gate does. Claude-Session: https://claude.ai/code/session_01Ld2EFkQ3DuoFGfvXXqLKZF

Aleksandr Makarov added 16 commits June 23, 2026 13:22

feat(audit): add AI-SDLC metrics catalog reference

398c733

feat(audit): add AI-SDLC data-sources reference

51fe3ed

feat(audit): add AI-SDLC adoption-index reference

fbab238

docs(audit): rebaseline metrics catalog to collectors/metrics/standar…

1b5d324

…ds model

docs(audit): add measurement TDD plan + revise exec deliverable to we…

98797b3

…ighted-category model

style(audit): prettier-normalize the measurement design spec

0e8ea96

docs(audit): rebaseline data-sources to discover-first + current-stat…

f29fe4b

…e history

docs(audit): drop 0-100 adoption-index (replaced by per-metric reliab…

e404575

…ility)

feat(audit): add standards.toml capability-category data + doc + lint

675a1b8

feat(audit): replace A-F deduction scoring with additive weighted cat…

e391932

…egories

feat(audit): dimension-auditor parses standards.toml, emits weighted …

277e348

…points + reliability

feat(audit): map every dimension check to a weighted standards.toml c…

64d3bd0

…ategory

feat(audit): SKILL.md sums weighted categories, drops overall grade

26d62e2

feat(audit): report templates show weighted points + reliability, dro…

5118e78

…p A-F

chore(audit): bump plugin + marketplace to 2.2.0 (weighted-category s…

a453584

…coring)

AlexanderMakarov added the minor label Jun 24, 2026

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Aleksandr Makarov added 11 commits June 24, 2026 14:32

fix(audit): reclassify ARCH-02/ARCH-04/DOC-04/SBP-06 to detected (det…

4e2d9bd

…erminism over LLM sampling)

feat(audit): collector artifact contract + shared writer + README

0097ee7

feat(audit): ci/tracker/docs collectors (Tiers C/I/D, partial-vs-absent)

a3c5500

feat(audit): detector contract + shared grep/iter lib + README

0b14a9e

Aleksandr Makarov added 30 commits July 2, 2026 10:22

style: prettier-format the audit-metric-issues dossier

2dcd870

Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

test(audit-harness): default --model to the unpinned sonnet alias

bdd4e95

'sonnet' resolves to the best available Sonnet at run time, so the harness doesn't silently pin audits to an outdated version id. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

chore: keep local review/ drafts out of commits and prettier

cd63683

The pr-review/pr-comments-address skills write their drafts and plans to review/; ignore the folder so a local review trail can't ride into a PR. Claude-Session: https://claude.ai/code/session_014UpLfgFPaACkrEUuk7CGiF

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(audit): AI-SDLC measurement engine — weighted-category scoring + deterministic engine + 3-tab report (full re-architecture)#139

feat(audit): AI-SDLC measurement engine — weighted-category scoring + deterministic engine + 3-tab report (full re-architecture)#139
AlexanderMakarov wants to merge 238 commits into
mainfrom
feat/ai-sdlc-metrics

AlexanderMakarov commented Jun 24, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AlexanderMakarov commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Metric correctness & taxonomy restructure

Validation

Notes

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

AlexanderMakarov commented Jun 24, 2026 •

edited

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading