Stamp ReleaseHistory v5.0.2 section with nuget links and bump VersionPrefix#2960
Merged
Conversation
…Prefix Promotes the v5.0.2 work to ship-ready: - `src/build.props`: `VersionPrefix` 5.0.1 -> 5.0.2, `PreviousVersionPrefix` 5.0.0 -> 5.0.1. - `ReleaseHistory.md`: stamp the `v5.0.2` header with nuget links for Sdk / Driver / Converters / Multitool / Multitool Library (UNRELEASED -> shipped). - `skills/emit-sarif-findings/SKILL.md`: bump the recommended Sarif.Multitool minimum from 5.0.1 to 5.0.2. v5.0.2 is the first release where `emit-init-run` enriches versionControlProvenance from the CI pipeline environment (Azure DevOps + GitHub Actions), which the skill's required commit-sha / branch / repo-uri inputs are stamped from automatically. Six v5.0.2 bullets ship: * BRK: scripts/Generate-CweTaxonomy.ps1 -> scripts/generate_cwe_taxonomy.py (#2921 / #2950). * NEW: `emit-init-run` enriches `versionControlProvenance` from ADO + GitHub Actions env (#2957 / #2959). * NEW: GHAzDO1021 `ProvideShortBranchNameInVcp` (#2954 / #2958). * BUG: Drop AI1015 `ProvideRunDefaultSourceLanguage` (#2948). * BUG: SARIF1012 NRE on unresolved ruleId (#2944 / #2949). * BUG: SARIF1001 case-fold relaxation for AI notification descriptors (#2951 / #2955). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
michaelcfanning
added a commit
that referenced
this pull request
May 28, 2026
* emit-init-run: auto-stamp ADO pipeline automationDetails from env + GHAzDO sample (#2929) * emit-init-run: auto-stamp ADO pipeline automationDetails from env + GHAzDO sample Adds AdoPipelineContext, which detects an Azure DevOps pipeline execution context from the standard predefined environment variables and stamps run.automationDetails so producers that run inside ADO pipelines automatically satisfy GHAzDO1019 and GHAzDO1020 with no additional CLI flags. - TryDetect is three-state (None / Partial / Complete). Partial fails loudly with a per-variable diagnostic before any file-system side effects so a misconfigured pipeline never emits a half-stamped SARIF. - ApplyTo writes the canonical azuredevops/pipeline/build/<org>/<projectId>/<buildDefId>/<phaseId>/<branchRef>/<buildId> id and the four azuredevops/pipeline/build/* property keys ADO Advanced Security ingestion validates. - Composes with the existing --automation-guid / --automation-correlation-guid flags; never overwrites a producer-supplied guid/correlationGuid. CweGenerateSample.ps1 grows a -GHAzDO switch that produces the new CweGHAzDoSample.sarif fixture alongside the existing CweSample.sarif. The script populates the ADO env vars for the duration of emit-init-run so AdoPipelineContext stamps automationDetails, then patches tool.driver.fullName post-finalize so GHAzDO1018 passes. Default-mode runs explicitly clear those same env vars so a developer shell with TF_BUILD=True can never drift the AI-shape fixture. CweGHAzDoSample.sarif validates with zero errors, zero warnings, and zero notes under --rule-kind Sarif;AI;GHAzDO. CweGeneratedSampleTests covers both fixtures with byte-identical regression gates as separate [Fact]s sharing one private helper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Trim ReleaseHistory bullets + add copilot-instructions.md The two bullets I just added for env-driven ADO stamping and the GHAzDO sample fixture were PR-description-sized, not release-note-sized. Trimmed both to match the style of their neighbors (single self-contained sentence + concrete names + minimal facts a downstream consumer needs). The full narrative — three-state detection prose, env-var precedence table, composition guarantees — already lives on PR #2929 where it belongs. Adds .github/copilot-instructions.md so future agents in this repo see the release-notes-vs-PR-description distinction up front, plus the house idioms that come up repeatedly in code review (no [Theory], GHAzDO casing, AI ruleId convention, sample-fixture convention, side-effects-after-detection, internals-via-InternalsVisibleTo). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Port SARIF AI generation guidance from ai-plugins to sarif-sdk (#2930) Make sarif-sdk the single source of truth for the SARIF spec markdown, the AI-generated-findings profile, and the agent skills that emit and validate AI SARIF. Adds: - docs/spec/sarif-v2.1.0-spec.md Convenience markdown rendering of the OASIS SARIF 2.1.0 specification (Plus Errata 01). The OASIS-published document is canonical; IPR notice preserved at top of file. - docs/ai/generating-sarif.md Normative guidance for representing AI/LLM-produced security findings as first-class SARIF: ai/origin declaration, tool identity, result structure, exploitability and attacker-position vocabulary, evidence model, redaction, notification taxonomy (AI/EXEC/*, AI/CFG/*), and the full AI rule-pack appendix. Includes a Mermaid object-model diagram in the appendix. - docs/ai/example.sarif Comprehensive reference SARIF log that conforms to the AI profile. Passes `dotnet sarif validate --rule-kind 'Sarif;AI'` cleanly. - skills/emit-sarif-findings/SKILL.md Agent-operating procedure for emitting AI SARIF using the Sarif.Multitool emit verbs (emit-init-run, add-result, add-notification, emit-finalize --validate). Multitool-only; cross-references docs/ai/generating-sarif.md as the normative source. - skills/validate-sarif-findings/SKILL.md Agent-operating procedure for validating AI SARIF. Uses `--rule-kind 'Sarif;AI'` against the multitool's AI rule pack (AI1003-AI2019) plus the standard SARIF rules in one pass. Updates: - README.md adds a short pointer section to the new spec, guidance, and skills directories. - docs/multitool-usage.md gains a 'Modes' table entry for each of the new emit verbs (emit-init-run, add-result, add-notification, emit-finalize) plus a worked example. Verification gates run before commit: - `dotnet sarif validate docs/ai/example.sarif --rule-kind 'Sarif;AI'` reports 0 errors. - End-to-end smoke test (init -> add-result -> finalize --validate) produces a SARIF file with 1 result, 1 rule (CWE-78 enriched from the embedded MITRE CWE taxonomy). - All skill command snippets match actual --help output for the relevant verb at Sarif.Multitool 5.0.0. Companion work (separate PR in microsoft/ai-plugins): - Delete plugins/sarif/ entirely; the canonical home is now this repository. - Retool Swallowtail (and other AI-detector plugins in ai-plugins) to invoke Sarif.Multitool emit verbs directly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add PublishSampleToGhazdo.ps1 + clone-aware CweGenerateSample.ps1 (#2931) `CweGenerateSample.ps1` now derives `--vcp-repositoryuri` and the `emit-finalize --srcroot` prefix from `git -C $repoRoot remote get-url origin`, falling back to `https://github.com/microsoft/sarif-sdk` when origin is unset. On the canonical microsoft/sarif-sdk clone the generated fixtures (CweSample.sarif, CweGHAzDoSample.sarif) are byte-identical to the previous hardcoded form. GitHub origins get a `<repo>/blob/main/` SRCROOT prefix; other hosts (including ADO) get the bare repo URL with a trailing slash. Adds `src/Sarif/Taxonomies/PublishSampleToGhazdo.ps1` -- POSTs a gzipped SARIF to the GHAzDO SARIFs ingestion endpoint (`/{org}/{project}/_apis/alert/repositories/{repo}/sarifs?api-version= 7.2-preview.1` on advsec.dev.azure.com, fallback dev.azure.com). Target org/project/repo are parsed from runs[0].versionControlProvenance[0] .repositoryUri; PAT is read from the ADO_PAT environment variable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Scrub Microsoft-internal references from AI guidance port (#2932) Public-OSS hygiene pass on the SARIF AI guidance and skills. Descriptor ids that are already shipped in the SDK (AI/EXEC/ALAS-SIGNAL in AI2018.ProvideExecutionSignalArtifact and AI1014's AI/EXEC/* and AI/CFG/* prefixes) are kept as-is so the docs match the current SDK implementation. Changes: - Drop ALAS expansion and neutralize the signal-payload schema (descriptor id kept; no payload schema was ever enforced by the SDK). - Replace ProjectApi with FastAPI (five sites) in API-handler examples. - Replace 'Geneva cluster' with 'telemetry cluster' in a deployment example. - Replace example rule id SWT-CPP-001 with ACME-CPP-001. - Replace author: mikefan with sarif-sdk-maintainers in both skill frontmatters. - Soften a reference to an unpublished companion remediation guidance document. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add multitool add-reporting-descriptor verb Appends a fully-formed SARIF reportingDescriptor JSON object — supplied via --input <path> or stdin — to the staged event log produced by emit-init-run. Two targets: * Default → run.tool.driver.notifications[]. AI producers routinely emit notification descriptors (progress, telemetry, config errors). No id convention is enforced; notifications use opaque ids. * --rules → run.tool.driver.rules[]. Gated against AIRuleIdConvention.IsNovel so only NOVEL- novel-finding descriptors are accepted. Taxonomy-mapped rule descriptors (e.g., CWE-89) come from the taxonomy enricher at finalize time, not from this verb. Each descriptor id may appear at most once per event log. The verb scans the existing event log on receipt and rejects duplicates against either a prior add-reporting-descriptor event of the same target OR a descriptor pre-populated on the run-header. A --force escape hatch is acknowledged in error text but intentionally out of v1 scope. Event-log plumbing: * Adds SarifEventKinds.RuleDescriptor ("rule-descriptor") and SarifEventKinds.NotificationDescriptor ("notification-descriptor"), threaded through SarifEventLogReader's kind allow-list. * SarifEventReplayer buffers descriptor events and merges them into the target list BEFORE RegisterDescriptorsFromResults runs. This ordering matters: auto-registration synthesizes minimal descriptors only for ruleIds that aren't already represented, so an explicit NOVEL- descriptor pre-empts the minimal one. Header pre-populated descriptors are preserved by reference; the verb's emit-time dedup blocks id collisions between header and events. * New event kinds are additive within CurrentSchemaVersion = 1; older readers will skip unknown kinds harmlessly, matching the forward- compat shape used when Notification / Invocation kinds were added. Tests: * 16 [Fact] tests on AddReportingDescriptorCommand covering both happy paths (notifications default, --rules), id validation (missing/empty/ non-string), the NOVEL gate (taxonomy id rejection on --rules path only), rich payload round-trip (messageStrings, defaultConfiguration, helpUri, properties — including a date-shaped property string to guard against Json.NET DateTime coercion), duplicate detection within and across targets, duplicate detection against header-pre-populated descriptors for both target arrays, missing-wip-file path, and two malformed-input cases (bad JSON, non-object root). * 3 [Fact] tests on SarifEventReplayer covering: rule-descriptor events populating rules and pre-empting auto-registration, notification- descriptor events populating notifications, and the header-pre-populated + events merge semantics. No [Theory]/[InlineData] — repeated scenarios use shared private helpers (SeedRunHeader) per house style. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Strip editorial prefixes from AI notification taxonomy (#2934) Notification descriptor ids now name the concern only — `DECISION`, `RULED-OUT`, `DATA-ACCESS-DENIED`, `ALAS-SIGNAL`, `TOOL-UNAVAILABLE`, etc. The previous `AI/EXEC/*` and `AI/CFG/*` prefixes repeated context the surrounding SARIF already carries: the array (`toolExecutionNotifications` vs `toolConfigurationNotifications`) encodes the kind, and `tool.driver.name` encodes the emitter. The same id MAY now legally appear in both arrays. Suffixing `EXEC` or `CFG` on every id is like suffixing `Class` on every C# class — the surrounding context already says what kind of thing it is. Placement is selected at authoring time: `add-notification` defaults to `toolExecutionNotifications`; `add-notification --config` (`-c`) routes to `toolConfigurationNotifications`. The event-log kind `SarifEventKinds.Notification` splits into `ExecutionNotification` (`"execution-notification"`) and `ConfigurationNotification` (`"configuration-notification"`); the replayer routes each to the matching invocation array. `AI1014.ExecutionNotificationPlacement` is deleted. Its sole purpose was enforcing prefix-vs-array consistency, which is structurally meaningless under the new convention (the array IS the kind). `AI2018` retains its semantic; the literal id it checks changes from `AI/EXEC/ALAS-SIGNAL` to `ALAS-SIGNAL`. BRK by the letter of v4.6.3 (AI1014 was added there), but AI rules adoption is low and v5.x is the right place for refinement over back-compat. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Generalize ALAS-SIGNAL notification id to LEARNING-SIGNAL (#2935) ALAS named a specific consumer (an internal learning system). Under the convention shipped in #2934, notification ids name the concern, not the consumer. LEARNING-SIGNAL describes what the signal is, independent of who reads it. While here, rename the AI2018 rule from ProvideExecutionSignalArtifact to ProvideLearningSignalArtifact for consistency: the class checks the LEARNING-SIGNAL id, the "Execution" qualifier was redundant under the new convention (placement is encoded by the array, not the id), and downstream learning systems aren't necessarily reading only the execution-side array. Affects: AI2018 rule class + file + RuleId const + 3 resource keys/messages, the AI2018 row in docs and skills tables, and the UNRELEASED BRK bullet for #2934 (whose own "ALAS-SIGNAL example" becomes "LEARNING-SIGNAL", and which now documents the class-and-id rename together). BRK on the just-merged BRK (both still UNRELEASED) — favored over shipping the consumer name. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Stamp ReleaseHistory UNRELEASED section as v5.0.0 (#2937) build.props was already bumped to <VersionPrefix>5.0.0</VersionPrefix> in #2924 (the SHA-1 BRK). This finishes the v5.0.0 cut by replacing the UNRELEASED placeholder header in ReleaseHistory.md with the canonical version banner (Sdk / Driver / Converters / Multitool / Multitool Library nuget links), matching the v4.6.4 format. Picked up by #2936 (the dev to main promotion PR). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Trim and split the over-descriptive v5.0.0 notification-taxonomy BRK (#2938) Per release-notes house style, bullets are one or two self-contained sentences; PR-description prose belongs in the PR. The original bullet was ~3x the length of its neighbors and re-litigated the motivation. Split into two tighter bullets: 1. Convention change + routing mechanism (id-prefix strip, new --config switch, event-kind split). 2. Rule-table changes (AI1014 removal, AI2018 rename). Drops the "prefixes were redundant because..." explanation, the wire value parentheticals (`"execution-notification"` etc.), and the "ALAS named a specific consumer" parenthetical. The change itself is visible in the renames; the reader doesn't need the rationale. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix CweGenerateSample.ps1 -GHAzDO crash inside real ADO pipelines (#2940) The mseng microsoft.sarif-sdk pipeline broke on the first build of main after the v5.x promotion (build 31555367). Symptom: CweGenerateSample.ps1 (args: -Configuration Release -GHAzDO) exited with code 1. ADO pipeline context is partially configured. Either populate every required variable or clear them all. Problems: BUILD_DEFINITIONID='1234' disagrees with SYSTEM_DEFINITIONID='9978' (both name the same pipeline identifier and must match) Root cause: the deterministic-fixture env override in CweGenerateSample.ps1 stamps BUILD_DEFINITIONID=1234 for byte-stable output, but does not also override SYSTEM_DEFINITIONID. ADO agents inject both. The verb's must-match cross-check in AdoPipelineContext.TryDetect (correctly) refuses to proceed when the two disagree. Fix the script (not the verb): add SYSTEM_DEFINITIONID alongside BUILD_DEFINITIONID in the \ ordered hashtable, plus SYSTEM_JOBID / SYSTEM_JOBNAME alongside SYSTEM_PHASEID / SYSTEM_PHASENAME for symmetric hygiene (those pairs are exempt from must-match but the default-mode \ cleanup loop iterates \ and benefits from covering the agent's full fallback set). The fixture SARIF bytes do not change — the primary env vars were already set and are the ones the verb actually reads. Regression gate: new CweGHAzDoSample_RegenerationSucceeds_WhenAmbientAdoFallbackEnvVarsConflict [Fact] explicitly seeds SYSTEM_DEFINITIONID / SYSTEM_JOBID / SYSTEM_JOBNAME with values that disagree with the script's deterministic primaries before invoking the script. Without the script fix it fails the same way the mseng build did; with the fix it passes byte-identical. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Refresh v5.0.0 release-history layout + add prefix legend (#2939) Three changes, all in ReleaseHistory.md: 1. Add a prefix legend at the top of the file. Codifies the six prefixes (DEP / BRK / BUG / NEW / PRF / FUN) and the 'BRK leads each section' rule. Footnote notes that older sections may predate the convention. 2. Reorder the v5.0.0 section so all BRK bullets lead (BRK -> NEW -> BUG). Pure line shuffling; relative order preserved within each group. 3. Normalize the lone 'BUGFIX:' bullet in v4.6.4 to 'BUG:' (matches the legend's canonical form). The deep-history 'BUGFIX, BRK:' entry in the v1.x section is left alone — that's immutable shipped state. No code or schema changes; ReleaseHistory.md only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Replace emit-init-run flags with SARIF Run JSON contract A consumer agent reports the existing 14 typed CLI flags can't express multiple versionControlProvenance entries (one of which carries a properties bag documenting skills in play). Modeling every field as a flag explodes the surface; the peer emit verbs (add-result, add-notification, add-reporting-descriptor) already accept fully-formed SARIF JSON via --input or stdin for exactly this reason, with a documented rationale that applies more strongly to the run header than to a single result. Replace the v5.0.0 flag surface on emit-init-run with the same input/stdin payload contract. EmitInitRunOptions shrinks to three properties (OutputFilePath, InputFilePath, ForceOverwrite). The SarifEventReplayer's documented partial-Run shape (tool, language, columnKind, defaultEncoding, defaultSourceLanguage, originalUriBaseIds, versionControlProvenance, automationDetails, baselineGuid, redactionTokens, ...) is now reachable end-to-end through the verb. Receipt-time validators (no filesystem side-effects on rejection): required non-empty-string tool.driver.name; https-only tool.driver.informationUri and versionControlProvenance[].repositoryUri; https-or-file originalUriBaseIds["SRCROOT"].uri; canonical 8-4-4-4-12 automationDetails.guid/correlationGuid; exact-match ai/origin in {generated, annotated, synthesized}; SARIF-log-document rejection; parent-shape JSON-object enforcement at every nested accessor so a JValue indexer never throws into the broad catch. ADO stamping is now JToken-direct so producer-supplied SARIF fields outside the SDK typed Run model survive the wip-line append; the existing typed-Run materialization at emit-finalize is the documented boundary at which non-typed fields are dropped, consistent with every other SDK round-trip. AdoPipelineContext.ApplyTo(Run) becomes bool TryApplyTo(Run, out string error). It stamps automationDetails.id and the four azuredevops/pipeline/build/* properties only when absent and fails-with-diagnostic on per-field conflict. The previous unconditional-overwrite contract was inert in v5.0.0 (the flag surface couldn't supply those fields) but became a footgun once JSON input could. CweGenerateSample.ps1 rewrites its emit-init-run call to construct a PowerShell hashtable -> ConvertTo-Json -Depth 32 -Compress -> stdin pipe. Both CweSample.sarif and CweGHAzDoSample.sarif regenerate byte-identically (verified by CweGeneratedSampleTests, which gates the fixtures sha-256). skills/emit-sarif-findings/SKILL.md Step 1 is rewritten to show the JSON construction; the inputs table picks up the multi-VCP and properties-bag annotations; the package constraint bumps to Sarif.Multitool >= 5.1.0. docs/multitool-usage.md's flag example is replaced with the stdin form. ReleaseHistory.md gets a new v5.1.0 UNRELEASED section with three bullets: BRK on the flag-surface removal, BRK on AdoPipelineContext.ApplyTo, NEW on the JSON-payload contract. Verification: - dotnet build src/Sarif.Sdk.sln: 0 warnings, 0 errors. - Test.UnitTests.Sarif.Multitool.Library: 217 passed, 1 skipped. - Test.UnitTests.Sarif: 896 passed, 3 skipped. - Test.UnitTests.Sarif.Driver: 140 passed, 1 skipped. - CweGeneratedSampleTests (3): pass; both fixtures byte-identical. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Stamp ReleaseHistory v5.0.1 section with nuget links src/build.props was bumped to <VersionPrefix>5.0.1</VersionPrefix> in be6fb70 (the emit-init-run JSON-contract change). This finishes the v5.0.1 cut by replacing the UNRELEASED placeholder header in ReleaseHistory.md with the canonical version banner (Sdk / Driver / Converters / Multitool / Multitool Library nuget links), matching the v5.0.0 format. Folded into #2942 so main is shippable the moment the promotion lands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Trim v5.0.1 release-notes bullets to neighbor density The three bullets were 4-6 sentences each, embedding validator catalogs and finalize-time round-trip prose that belong in the PR description, not in ReleaseHistory.md. Repo style explicitly calibrates against the neighbors and asks for trim/split when a bullet exceeds ~3x — the BRK and NEW bullets here now sit at roughly the same density as the v5.0.0 rename bullets above them. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add multitool add-invocation verb Mirrors add-result / add-notification / add-reporting-descriptor: takes a fully-formed SARIF Invocation JSON object via --input <path> or stdin and appends it to the staged event log as a SarifEventKinds.Invocation event. SarifEventReplayer strips run.invocations[] carried on the run header, so this verb is the only path producers have to populate the array. The verb imposes no schema beyond must be a JSON object (SARIF makes every field on Invocation optional); full-log shape validation lives in emit-finalize --validate. AddInvocationOptions / AddInvocationCommand follow the established pattern. Program.cs registers and dispatches the new verb. SKILL.md, docs/multitool-usage.md, and ReleaseHistory.md updated. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Drop unused System.Text using in EmitInitRunCommandTests CI's BuildAndTest.ps1 invokes dotnet build with --no-incremental and /p:EnforceCodeStyleInBuild=true, which surfaces IDE0005 (unused using) as an error. Local Debug + default incremental builds skipped the check and let the unused System.Text directive ride into be6fb70. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix NOVEL ruleId example and clarify enricher ownership of region.snippet Two corrections to docs/ai/generating-sarif.md flagged by an external AI-authoring feedback session that retargeted against Sarif.Multitool@5.0.1: 1. The Novel-findings subsection used the slash form ('NOVEL/<sub-id>' on result.ruleId, bare 'NOVEL' on the descriptor) - which AddResultCommand and AddReportingDescriptorCommand reject at receipt. The canonical form (per docs/AI-RuleId-Convention.md and AIRuleIdConvention.s_novelPrefix) is the dash-flat 'NOVEL-<sub-id>'; descriptor.id and result.ruleId are byte-identical. The obsolete 'ruleIndex required for NOVEL' paragraph is removed - each NOVEL- now has a unique id, so the SARIF 3.19.23 non- unique-id workaround no longer applies. 2. The Code Context subsection told AI tools to populate region.snippet and contextRegion.snippet on every finding. emit-finalize already runs InsertOptionalDataVisitor with RegionSnippets | ContextRegionSnippets | ComprehensiveRegionProperties | Hashes, reading the file from disk and filling these fields itself. The producer SHOULD emit region.startLine and region.endLine; the enricher owns everything else. Pre-populating wastes tokens and drift-risks the consumer's view of the file. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Restore RuleKind.Ado as [Obsolete] alias for RuleKind.GHAzDO (#2945) #2928 renamed RuleKind.Ado to RuleKind.GHAzDO without leaving a back-compat alias. Restore Ado as an [Obsolete] alias resolving to the same underlying value (4), so pre-rename source still compiles and '--rule-kind ado' continues to bind on the multitool CLI via the existing case-insensitive enum parser. The obsolete-warning steers new callers off the deprecated spelling without breaking them. Two new [Fact] tests pin the alias contract (same value; case-insensitive parse of 'ado' resolves to GHAzDO). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Trigger Validate workflow on dev-targeted pushes and PRs PRs targeting dev were silently skipping the build-and-test / check-format / build-multitool-for-npm jobs because the workflow filter was scoped to main only. Adding dev to both the push and pull_request branch lists so CI gates dev work the same way it gates main work. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Consolidate v5.0.1 release-notes for actual ship v5.0.1 was tagged but never published to NuGet, so the bullet that lived under an unreleased v5.0.2 header folds back into v5.0.1 — that's the version that will actually ship from this PR. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Make autocrlf-sensitive tests deterministic across line-ending configurations (#2859) On Windows agents with `core.autocrlf=input` (LF on disk, CRLF Environment.NewLine), several tests compared values normalized to `Environment.NewLine` against C# verbatim string literals whose embedded newlines were LF on disk. They passed on default Windows (`autocrlf=true`: CRLF on disk = CRLF NewLine) and on Linux/Mac (`autocrlf=input`: LF on disk = LF NewLine) but failed on the cross-grained combo that no CI configuration exercises. Two principled moves: 1. Boundary normalization in `TestAssetResourceExtractor.GetResourceText` — canonicalize the read text to `Environment.NewLine` (`\r\n` -> `\n` -> `Environment.NewLine`). This is the single point where text resources enter the test harness, so every consumer (`FileDiffingUnitTests`, `InsertOptionalDataVisitorTests`, and ad-hoc callers) inherits the normalization for free. 2. Rewrite the affected literal-string assertions to express newlines explicitly with `Environment.NewLine` rather than rely on the source file's on-disk line endings: `string.Join(Environment.NewLine, ...)` for multi-line bodies, `$@"...{Environment.NewLine}..."` for short ones. Touches `StackTests`, `WebRequestTests`, `WebResponseTests`, `AndroidStudioConverterTests`, `FortifyUtilitiesTests`. `InsertOptionalDataVisitor.txt` is embedded as a resource and hashed by a `[Trait(TestTraits.WindowsOnly, "true")]` test where the on-disk hash must remain stable, so `.gitattributes` pins that file to `eol=crlf`. Co-authored-by: Michael C. Fanning <mikefan@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Drop AI1015 ProvideRunDefaultSourceLanguage from AI authoring guidance (#2948) * Drop AI1015 ProvideRunDefaultSourceLanguage from AI authoring guidance The partition-by-language model proved insufficient as a baseline-updating strategy (the earlier hypothesis that we could replace AI results for one language while retaining results for another on receipt of a new log file). Remove the MUST that an AI run set 'run.defaultSourceLanguage' and partition by '(repository, branch, language)' tuple, along with the 'Run partitioning by language' section in 'docs/ai/generating-sarif.md' and the AI1015 row in the rule table. 'defaultSourceLanguage' remains an accepted optional SARIF Run field for viewer rendering; we simply no longer mandate it. No code references existed for AI1015, so this is a pure doc walkback. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Reclassify AI1015 drop as BUG and trim bullet Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix SARIF1012 NRE when result.ruleId does not resolve (#2944) (#2949) `SARIF1012.MessageArgumentsMustBeConsistentWithRule` previously threw a `NullReferenceException` when `result.message.id` was set but `result.ruleId` did not resolve to a rule in `tool.driver.rules[]` (no match by id, ruleIndex, or hierarchical base). The null-guard at lines 52-53 inspected `currentRules` (the collection) rather than the resolved `rule` instance, and used a short-circuiting null-conditional check that silently fell through to the diagnostic-emit branch's indexer access. Replace the guard with an explicit three-prong check: rule == null || rule.MessageStrings == null || !rule.MessageStrings.ContainsKey(result.Message.Id) All three null cases now emit the existing `MessageIdMustExist` diagnostic with the unresolved rule id (or `null`). Regression fixture: adds a 4th result to `SARIF1012.MessageArgumentsMustBeConsistentWithRule_Invalid.sarif` that references a non-existent `NoSuchRule` with `message.id: AnyId`, plus the corresponding expected diagnostic at `startLine: 53`. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Replace Generate-CweTaxonomy.ps1 with cross-platform Python regenerator (#2921) (#2950) The PowerShell-only 'scripts/Generate-CweTaxonomy.ps1' assumed a Windows / pwsh environment to regenerate the CWE taxonomy assets, which is friction for non-Windows contributors and the original complaint behind #2921. Replace it with a single 'scripts/generate_cwe_taxonomy.py' (Python 3, stdlib only) that runs identically on Linux, macOS, and Windows. The script: * Downloads 'cwec_latest.xml.zip' via 'urllib.request' into a temporary staging directory. * Extracts via 'zipfile' and locates the embedded 'cwec_v*.xml'. * Parses every weakness (handling the CWE XML namespace), buckets by Status, sorts by numeric ID, derives the SARIF2012-conformant Pascal-case identifier (preferring the parenthesized common name when present), resolves the View-1000 ChildOf Primary parent, and emits the four-section help markdown (Description / Extended Description / Common Consequences / Potential Mitigations). * Writes 'CweTaxonomy.sarif' and 'CweTaxonomy.brief.md' in place under 'src/Sarif/Taxonomies/' with UTF-8 (no BOM) and LF line endings so the embedded resources hash identically regardless of host OS. Default invocation requires no arguments and downloads from MITRE: python3 scripts/generate_cwe_taxonomy.py For offline / testing scenarios, '--xml path/to/cwec_v*.xml' bypasses the download; '--source-url' overrides the MITRE URL; '--output-dir' overrides the artifact destination. '.gitattributes' retains the LF pinning on the two generated artifacts. 'src/Sarif/Taxonomies/CweReadme.md' Regeneration section is rewritten to show the single Python invocation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Narrow SARIF1001 case-fold relaxation to AI-origin notification descriptors (#2951) (#2955) Initial revision of this fix universally relaxed SARIF1001's id/name comparison to 'StringComparison.Ordinal' on the grounds that SARIF v2.1.0 § 3.49.7 only forbids strict-identical pairs. Per maintainer review, that read overshoots: the case-fold comparison is an authorial SHOULD layered above the spec MUST, and SHOULD layers are precisely what validators add value with. Removing the SHOULD globally weakens the typo-catcher for hand-authored descriptors. The narrower carve-out: AI notification taxonomies (issue #2952) deliberately pair a SCREAMING-CAPS opaque id with the corresponding PascalCase end-user name (e.g. 'DECISION' / 'Decision'). That convention is machine-coordinated and specific to AI emitters; it does not apply to hand-authored rule and taxon descriptors. The cut is the intersection of two existing context signals: 1. 'IsAIOriginRun()' -- the run carries 'properties["ai/origin"]', the same gate used by SARIF2002, SARIF2009 (the literal peer rule for identifier conventions), SARIF2014, and SARIF2015. 2. 'Context.CurrentReportingDescriptorKind == Notification' -- the descriptor was reached via 'tool.driver.notifications[]', not 'rules[]' or 'taxa[]'. AI rule ids are constrained by AI1012 to 'BASE/sub-id' or 'NOVEL-<sub-id>' forms whose hyphens / slashes cannot case-fold-collide with any PascalCase name, so extending the carve-out to rules would be a no-op for AI and a regression for hand-authored taxonomies. Strict-identical comparisons remain unconditional everywhere (spec MUST). The Invalid functional fixture now spans three runs covering each boundary of the carve-out: run[0] non-AI rules RULE0001/RULE0001 (spec MUST) RULE0002/RULE0002 (spec MUST) run[1] AI-origin notifications STRICT/STRICT (spec MUST under AI) rules DECISION/Decision (rules-not-exempt) run[2] non-AI notifications DECISION/Decision (non-AI-not-exempt) The Valid functional fixture pairs a non-AI tool with hand-authored rules and an AI-origin tool whose notifications carry 'DECISION/Decision' and 'RULE-COVERAGE-GAP/RuleCoverageGap' -- the latter pair documents the taxonomy convention even though hyphens already keep it case-fold-distinct. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add GHAzDO1021 (ProvideShortBranchNameInVcp) (#2954) (#2958) Add a GHAzDO validation rule for run.versionControlProvenance[].branch values that start with refs/<class>/, using ^refs/[^/]+/ to detect full-ref branch names and recommending the stripped short form. The AdvSec Service silently drops VCP entries whose branch is not a short branch name, so the rule is scoped only to versionControlProvenance[].branch. It deliberately does not flag the full-ref branch segment embedded in run.automationDetails.id, which remains part of the existing GHAzDO1020 contract. Add valid and invalid ValidateCommand fixtures covering short branch names and refs/heads, refs/tags, and refs/pull full refs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Enrich versionControlProvenance from CI pipeline env (ADO + GitHub Actions) (#2957) (#2959) EmitInitRunCommand currently stamps automationDetails.id and the four `azuredevops/pipeline/build/*` property keys when `TF_BUILD=True` is detected, but it does not lift anything into `versionControlProvenance` - even though `BUILD_SOURCEBRANCH` is already in hand. AdvSec ingestion relies on VCP for branch + revision attribution, and this gap means the data lands only when a producer hand-constructs the entry into the run-header JSON. The same gap exists for AI scanners running under GitHub Actions. Extend AdoPipelineContext to read the two optional argument vars `BUILD_REPOSITORY_URI` and `BUILD_SOURCEVERSION`, and derive a short branch name from the existing `BUILD_SOURCEBRANCH` by stripping any leading `refs/<class>/` segment (so `refs/heads/main` becomes `main`; `refs/pull/42/merge` becomes `42/merge`; `refs/tags/v1` becomes `v1`). The two new env vars are optional - absence does not degrade Complete -> Partial - but malformed presence does (URI must be absolute http(s); revision must match `^[0-9a-fA-F]{7,40}$`). Add a parallel `GitHubActionsContext` for `GITHUB_ACTIONS=true` that mirrors the same shape but is VCP-scoped (no pipeline-identity contract today): `GITHUB_SERVER_URL` + `GITHUB_REPOSITORY` compose the repository URI, `GITHUB_SHA` supplies the revision (same hex regex), and `GITHUB_REF_NAME` is preferred over `GITHUB_REF` (stripping the same `refs/<class>/` prefix). Custom GHES servers compose correctly via trailing-slash normalization. EmitInitRunCommand grows a `TryStampVcp` JObject-direct stamper that mirrors the existing `TryStampAdoContext` shape and operates on three input shapes: 1. `versionControlProvenance` absent or empty array -> synthesize a new entry only when `repositoryUri` was detected (anchor field). Branch/revision without a repository URI is informationally thin and the synthesized entry would not bind to a repo for ingestion. 2. `versionControlProvenance` contains exactly one entry -> enrich missing fields; fail with a per-field conflict diagnostic when any supplied field disagrees with the detected pipeline value. Repository URI equality treats scheme/host case-insensitively (RFC 3986) via `Uri.TryCreate` round-trip; branch and revision are byte-wise. 3. `versionControlProvenance` contains multiple entries -> leave untouched. The caller has declared a multi-repo shape and we refuse to guess which entry names the pipeline's source repo. A `TryResolveVcpFields` orchestrator layers the two sources before stamping: ADO is the higher-priority source per the documented "env-takes-priority" rule, GHA fills any gap where ADO is silent, and fields populated on both sources MUST agree or the verb aborts with a diagnostic naming both sources. The stamper itself is source-agnostic - it takes the resolved (repositoryUri, revisionId, branch) triple and does not know which env produced each field. Probe-before-write semantics on the single-entry path leave the JObject unchanged when a conflict is detected, matching the existing `TryStampAdoContext` contract - a half-stamped VCP is worse than a clean refusal. Why no disk-git fallback? The verb deliberately does not shell out to `git.exe` to recover this data from the working tree. The two CI envs cover both surfaces an AI scanner lands in, `add-result`'s producer-supplied JSON is the universal escape hatch for everything else, and adding a soft runtime dependency on `git.exe` (with its own failure modes around shallow clones, detached HEADs, and non-existent branches) would make the verb's output depend on disk state. Producers running locally outside CI either set the env vars themselves or populate VCP directly in the input JSON. CI fixture isolation `CweGenerateSample.ps1`'s `\` map gains the two new optional ADO vars (`BUILD_REPOSITORY_URI` set to the resolved git remote URL; `BUILD_SOURCEVERSION` set to the same zero-SHA placeholder the hardcoded VCP entry carries) so the -GHAzDO variant stamps a fully populated ADO env shape and AdoPipelineContext detects no conflict against the supplied VCP. The same map also adds `\` entries for `GITHUB_ACTIONS` / `GITHUB_SERVER_URL` / `GITHUB_REPOSITORY` / `GITHUB_SHA` / `GITHUB_REF_NAME` / `GITHUB_REF` so that ambient GitHub Actions env on the macOS CI runner (which sets a real `GITHUB_SHA`) cannot trip GitHubActionsContext into reporting a revisionId that conflicts with the zero-SHA placeholder. Without this scrubbing, both `CweSample_Sarif_IsByteIdenticalToCweGenerateSampleScriptOutput` and `CweGHAzDoSample_Sarif_IsByteIdenticalToCweGenerateSampleScriptOutput` break under macos-latest. The same property is now gated by `CweGHAzDoSample_RegenerationSucceeds_WhenAmbientGitHubActionsEnvVarsConflict`, the GHA-side parallel of the existing ambient-ADO regression test. Closes #2957. Tests: 7 new EmitInitRunCommandTests covering GHA-only stamping, ADO+GHA agreement, the gap-fill path, cross-source disagreement on each field, GHA partial-env refusal, and producer-supplied conflicts under GHA. 11 new GitHubActionsContextTests covering detection states, malformed inputs, REF_NAME vs REF precedence, and URI normalization. 1 new CweGeneratedSampleTests fact gating ambient-GHA-env isolation. The original 11 ADO VCP EmitInitRunCommandTests and 9 AdoPipelineContextTests still green. 263/264 in Test.UnitTests.Sarif.Multitool.Library and 4/4 CweGeneratedSampleTests pass locally. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Stamp ReleaseHistory v5.0.2 section with nuget links and bump VersionPrefix (#2960) Promotes the v5.0.2 work to ship-ready: - `src/build.props`: `VersionPrefix` 5.0.1 -> 5.0.2, `PreviousVersionPrefix` 5.0.0 -> 5.0.1. - `ReleaseHistory.md`: stamp the `v5.0.2` header with nuget links for Sdk / Driver / Converters / Multitool / Multitool Library (UNRELEASED -> shipped). - `skills/emit-sarif-findings/SKILL.md`: bump the recommended Sarif.Multitool minimum from 5.0.1 to 5.0.2. v5.0.2 is the first release where `emit-init-run` enriches versionControlProvenance from the CI pipeline environment (Azure DevOps + GitHub Actions), which the skill's required commit-sha / branch / repo-uri inputs are stamped from automatically. Six v5.0.2 bullets ship: * BRK: scripts/Generate-CweTaxonomy.ps1 -> scripts/generate_cwe_taxonomy.py (#2921 / #2950). * NEW: `emit-init-run` enriches `versionControlProvenance` from ADO + GitHub Actions env (#2957 / #2959). * NEW: GHAzDO1021 `ProvideShortBranchNameInVcp` (#2954 / #2958). * BUG: Drop AI1015 `ProvideRunDefaultSourceLanguage` (#2948). * BUG: SARIF1012 NRE on unresolved ruleId (#2944 / #2949). * BUG: SARIF1001 case-fold relaxation for AI notification descriptors (#2951 / #2955). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Revert GHAzDO1021 + short-form branch normalization (false premise) (#2962) The GHAzDO1021 rule (``ProvideShortBranchNameInVcp``) and the related short-form branch-name normalization in the ADO and GHA env enrichers were built on a false premise. The original observation -- that the GHAzDO/AdvSec ingestion service silently dropped ``run.versionControlProvenance[]`` entries whose ``branch`` started with ``refs/<class>/`` -- turned out to be job-processing latency misread as validation failure. A product engineer from the AdvSec team has since confirmed that ingestion accepts both short (``main``) and long (``refs/heads/main``) shapes, so there is nothing to warn about and nothing to normalize. This change reverts the lot before v5.0.2 ships: * Delete ``GHAzDO1021.ProvideShortBranchNameInVcp`` rule plus its four Valid/Invalid Inputs and ExpectedOutputs fixtures, the resx + Designer entries, and the ``RuleId`` constant. * ``AdoPipelineContext``: drop ``s_branchRefPrefixRegex``, ``BranchShortName``, and ``NormalizeBranchRef``. ``TryDetect`` passes ``BUILD_SOURCEBRANCH`` through verbatim; ``BranchRef`` is the sole branch property, used directly when stamping VCP. * ``GitHubActionsContext``: drop the ``GITHUB_REF_NAME`` fallback entirely (the runner always sets both env vars, so this is invisible in production but keeps the property honestly long-form). Rename ``BranchShortName`` -> ``BranchRef``; new ``TryReadOptionalBranchRef`` is a pass-through. * ``EmitInitRunCommand``: rename ``vcpBranchShortName`` -> ``vcpBranch``; ``TryResolveVcpFields`` / ``TryStampVcp`` parameter renames; doc comments updated. * Tests: ``ValidateCommandTests`` drops the two GHAzDO1021_* methods; ``AdoPipelineContextTests`` renames the four "BranchShortName_Strips*" tests to "Passes*Through" and asserts only ``BranchRef`` (long form); ``GitHubActionsContextTests`` switches to ``GITHUB_REF`` setup, drops the two RefName-preference tests in favour of three pass-through tests; ``EmitInitRunCommandTests`` updates four env setups and six assertions to the long form. * ``CweGenerateSample.ps1`` supplies ``branch = 'refs/heads/main'`` so the GHAzDO-variant sample agrees with the env-derived long form on cross-source check; ``CweSample.sarif`` and ``CweGHAzDoSample.sarif`` regenerated. * ``ReleaseHistory.md`` v5.0.2 UNRELEASED: drop the GHAzDO1021 NEW bullet; soften the VCP-enrichment bullet wording to reflect pass-through semantics (no short-form derivation). No version bump: v5.0.2 is unreleased (the dev->main promote PR is open and blocked); this lands as part of the same release window. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Derek Morris <Penguinwizzard@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Promotes the v5.0.2 work to ship-ready.
Changes
src/build.props:VersionPrefix5.0.1 -> 5.0.2;PreviousVersionPrefix5.0.0 -> 5.0.1.ReleaseHistory.md: stamp thev5.0.2header with nuget linksfor Sdk / Driver / Converters / Multitool / Multitool Library
(UNRELEASED -> shipped).
skills/emit-sarif-findings/SKILL.md: bump the recommendedSarif.Multitoolminimum from 5.0.1 to 5.0.2. v5.0.2 is thefirst release where
emit-init-runenrichesversionControlProvenancefrom the CI pipeline environment(Azure DevOps + GitHub Actions), which the skill's required
commit-sha / branch / repo-uri inputs are stamped from
automatically.
Six bullets ship in v5.0.2
scripts/Generate-CweTaxonomy.ps1with aPython 3 (stdlib only) regenerator runnable on Linux/macOS/Windows
without
pwsh(Format Generate-CweTaxonomy.ps1 and add a bash variant #2921 / Replace Generate-CweTaxonomy.ps1 with cross-platform Python regenerator (#2921) #2950).multitool emit-init-runenrichesrun.versionControlProvenancefrom the CI pipeline environment.Azure DevOps (
TF_BUILD=True) and GitHub Actions(
GITHUB_ACTIONS=true) are both supported; cross-sourcedisagreement is enforced when both env sets are populated. Adds
Microsoft.CodeAnalysis.Sarif.Multitool.GitHubActionsContext(emit-init-run does not auto-populate versionControlProvenance from any source (local .git, ADO env, or GitHub Actions env) #2957 / Enrich versionControlProvenance from CI pipeline env (ADO + GitHub Actions) (#2957) #2959).
ProvideShortBranchNameInVcpflags VCPbranch values that begin with a
refs/<class>/prefix(Proposal: GHAzDO1021 (ProvideShortBranchNameInVcp) — flag refs/heads/* in versionControlProvenance.branch #2954 / Add GHAzDO1021 (ProvideShortBranchNameInVcp) (#2954) #2958).
ProvideRunDefaultSourceLanguage(Drop AI1015 ProvideRunDefaultSourceLanguage from AI authoring guidance #2948).SARIF1012.MessageArgumentsMustBeConsistentWithRulenolonger throws
NullReferenceExceptionwhenresult.message.idis set but
result.ruleIddoes not resolve (validate: SARIF1012 (MessageArgumentsMustBeConsistentWithRule) throws NRE when result.message.id is set and result.ruleId does not resolve to a rule #2944 / Fix SARIF1012 NRE when result.ruleId does not resolve (#2944) #2949).SARIF1001.RuleIdentifiersMustBeValidnow exemptsAI-origin runs from the case-fold-equality check on notification
descriptors, mirroring SARIF2009. Strict-identical id/name pairs
continue to fire universally; the case-fold convention still
applies to rules and taxa regardless of run origin
(SARIF1001: case-insensitive 'name' vs 'id' comparison overrides spec text #2951 / Narrow SARIF1001 case-fold relaxation to AI-origin notification descriptor (#2951) #2955).
Verification
Test.UnitTests.Sarif.Multitool.Library: 263/264 (1 pre-existingskip).
Test.FunctionalTests.Sarif: 126/126.Test.UnitTests.Sarif: green (includes the fourCweGeneratedSampleTestscovering both ADO and GHA ambient-envisolation).
windows-latest, build-multitool-for-npm, check-format, license/cla
all green at merge time.