Skip to content

Replace Generate-CweTaxonomy.ps1 with cross-platform Python regenerator (#2921)#2950

Merged
michaelcfanning merged 1 commit into
devfrom
fix-2921-cwe-script-bash
May 27, 2026
Merged

Replace Generate-CweTaxonomy.ps1 with cross-platform Python regenerator (#2921)#2950
michaelcfanning merged 1 commit into
devfrom
fix-2921-cwe-script-bash

Conversation

@michaelcfanning
Copy link
Copy Markdown
Member

@michaelcfanning michaelcfanning commented May 27, 2026

Resolves #2921. The PowerShell-only scripts/Generate-CweTaxonomy.ps1 assumed a Windows / pwsh environment to regenerate the CWE taxonomy assets — friction for non-Windows contributors and the original complaint behind #2921.

This PR replaces it with a single scripts/generate_cwe_taxonomy.py (Python 3, stdlib only) runnable identically on Linux, macOS, and Windows. Default invocation requires no arguments and downloads from MITRE:

python3 scripts/generate_cwe_taxonomy.py

What the script does

  • Downloads cwec_latest.xml.zip via urllib.request into a temporary staging directory
  • Extracts via zipfile and locates the embedded cwec_v*.xml
  • Parses every weakness (handling the CWE XML namespace), buckets by Status, sorts by numeric ID, derives the SARIF2012-conformant Pascal-case identifier (preferring the parenthesized common name when present), resolves the View-1000 ChildOf Primary parent, and emits the four-section help markdown (Description / Extended Description / Common Consequences / Potential Mitigations)
  • Writes CweTaxonomy.sarif and CweTaxonomy.brief.md in place under src/Sarif/Taxonomies/ with UTF-8 (no BOM) and LF line endings so the embedded resources hash identically regardless of host OS

Flags

Flag Purpose
(none) Download from MITRE, regenerate in place
--xml <path> Use a pre-extracted XML (offline / testing)
--source-url <url> Override the MITRE download URL
--output-dir <path> Override the artifact destination

Why a Python rewrite instead of keeping the PS1 (or adding a bash wrapper)

The original draft of this PR added a bash wrapper around a Python core and kept the PS1 untouched — three scripts, two engines (PS1 + Python both doing full parse + emit). That carried real drift risk between the engines with no enforcement mechanism. The conversation that drove this rewrite concluded: collapse to one engine in the language that's universally available without install friction.

  • PS1 as engine on Linux requires pwsh install — friction
  • bash as engine requires xmlstarlet + jq (or awk gymnastics) — friction
  • Python is on macOS / Linux out of the box and is one-installer on Windows, parses XML in stdlib, and pretty-prints JSON in stdlib

Other

  • .gitattributes retains the LF pinning on CweTaxonomy.sarif and CweTaxonomy.brief.md; the *.sh eol=lf line is dropped (no shell scripts remain in scripts/)
  • src/Sarif/Taxonomies/CweReadme.md Regeneration section is rewritten to show the single Python invocation

Smoke

End-to-end smoke test against synthetic CWE XML packaged as a zip and downloaded via file://: download / extract / parse / emit all clean. Real MITRE catalog regen still needs to be run by a maintainer at next refresh to confirm byte parity with the existing committed artifacts.

@michaelcfanning michaelcfanning requested a review from cfaucon as a code owner May 27, 2026 16:06
@michaelcfanning michaelcfanning force-pushed the fix-2921-cwe-script-bash branch 2 times, most recently from a76040c to b5da34c Compare May 27, 2026 16:50
…or (#2921)

The PowerShell-only 'scripts/Generate-CweTaxonomy.ps1' assumed a Windows /
pwsh environment to regenerate the CWE taxonomy assets, which is friction
for non-Windows contributors and the original complaint behind #2921.

Replace it with a single 'scripts/generate_cwe_taxonomy.py' (Python 3,
stdlib only) that runs identically on Linux, macOS, and Windows. The
script:

* Downloads 'cwec_latest.xml.zip' via 'urllib.request' into a temporary
  staging directory.
* Extracts via 'zipfile' and locates the embedded 'cwec_v*.xml'.
* Parses every weakness (handling the CWE XML namespace), buckets by
  Status, sorts by numeric ID, derives the SARIF2012-conformant
  Pascal-case identifier (preferring the parenthesized common name when
  present), resolves the View-1000 ChildOf Primary parent, and emits
  the four-section help markdown (Description / Extended Description /
  Common Consequences / Potential Mitigations).
* Writes 'CweTaxonomy.sarif' and 'CweTaxonomy.brief.md' in place under
  'src/Sarif/Taxonomies/' with UTF-8 (no BOM) and LF line endings so the
  embedded resources hash identically regardless of host OS.

Default invocation requires no arguments and downloads from MITRE:

    python3 scripts/generate_cwe_taxonomy.py

For offline / testing scenarios, '--xml path/to/cwec_v*.xml' bypasses
the download; '--source-url' overrides the MITRE URL; '--output-dir'
overrides the artifact destination.

'.gitattributes' retains the LF pinning on the two generated artifacts.
'src/Sarif/Taxonomies/CweReadme.md' Regeneration section is rewritten
to show the single Python invocation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@michaelcfanning michaelcfanning force-pushed the fix-2921-cwe-script-bash branch from b5da34c to 5817331 Compare May 27, 2026 16:51
@michaelcfanning michaelcfanning changed the title Add bash variant for CWE taxonomy regeneration; format the PS1 (#2921) Replace Generate-CweTaxonomy.ps1 with cross-platform Python regenerator (#2921) May 27, 2026
@michaelcfanning michaelcfanning merged commit 2baa932 into dev May 27, 2026
6 checks passed
michaelcfanning added a commit that referenced this pull request May 28, 2026
…Prefix (#2960)

Promotes the v5.0.2 work to ship-ready:

- `src/build.props`: `VersionPrefix` 5.0.1 -> 5.0.2,
  `PreviousVersionPrefix` 5.0.0 -> 5.0.1.
- `ReleaseHistory.md`: stamp the `v5.0.2` header with nuget links
  for Sdk / Driver / Converters / Multitool / Multitool Library
  (UNRELEASED -> shipped).
- `skills/emit-sarif-findings/SKILL.md`: bump the recommended
  Sarif.Multitool minimum from 5.0.1 to 5.0.2. v5.0.2 is the first
  release where `emit-init-run` enriches versionControlProvenance
  from the CI pipeline environment (Azure DevOps + GitHub Actions),
  which the skill's required commit-sha / branch / repo-uri inputs
  are stamped from automatically.

Six v5.0.2 bullets ship:
* BRK: scripts/Generate-CweTaxonomy.ps1 -> scripts/generate_cwe_taxonomy.py
  (#2921 / #2950).
* NEW: `emit-init-run` enriches `versionControlProvenance` from
  ADO + GitHub Actions env (#2957 / #2959).
* NEW: GHAzDO1021 `ProvideShortBranchNameInVcp` (#2954 / #2958).
* BUG: Drop AI1015 `ProvideRunDefaultSourceLanguage` (#2948).
* BUG: SARIF1012 NRE on unresolved ruleId (#2944 / #2949).
* BUG: SARIF1001 case-fold relaxation for AI notification descriptors
  (#2951 / #2955).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
michaelcfanning added a commit that referenced this pull request May 28, 2026
* emit-init-run: auto-stamp ADO pipeline automationDetails from env + GHAzDO sample (#2929)

* emit-init-run: auto-stamp ADO pipeline automationDetails from env + GHAzDO sample

Adds AdoPipelineContext, which detects an Azure DevOps pipeline
execution context from the standard predefined environment variables
and stamps run.automationDetails so producers that run inside ADO
pipelines automatically satisfy GHAzDO1019 and GHAzDO1020 with no
additional CLI flags.

- TryDetect is three-state (None / Partial / Complete). Partial fails
  loudly with a per-variable diagnostic before any file-system side
  effects so a misconfigured pipeline never emits a half-stamped SARIF.
- ApplyTo writes the canonical
  azuredevops/pipeline/build/<org>/<projectId>/<buildDefId>/<phaseId>/<branchRef>/<buildId>
  id and the four azuredevops/pipeline/build/* property keys ADO
  Advanced Security ingestion validates.
- Composes with the existing --automation-guid / --automation-correlation-guid
  flags; never overwrites a producer-supplied guid/correlationGuid.

CweGenerateSample.ps1 grows a -GHAzDO switch that produces the new
CweGHAzDoSample.sarif fixture alongside the existing CweSample.sarif.
The script populates the ADO env vars for the duration of emit-init-run
so AdoPipelineContext stamps automationDetails, then patches
tool.driver.fullName post-finalize so GHAzDO1018 passes. Default-mode
runs explicitly clear those same env vars so a developer shell with
TF_BUILD=True can never drift the AI-shape fixture.

CweGHAzDoSample.sarif validates with zero errors, zero warnings, and
zero notes under --rule-kind Sarif;AI;GHAzDO. CweGeneratedSampleTests
covers both fixtures with byte-identical regression gates as separate
[Fact]s sharing one private helper.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim ReleaseHistory bullets + add copilot-instructions.md

The two bullets I just added for env-driven ADO stamping and the
GHAzDO sample fixture were PR-description-sized, not release-note-sized.
Trimmed both to match the style of their neighbors (single self-contained
sentence + concrete names + minimal facts a downstream consumer needs).
The full narrative — three-state detection prose, env-var precedence
table, composition guarantees — already lives on PR #2929 where it
belongs.

Adds .github/copilot-instructions.md so future agents in this repo see
the release-notes-vs-PR-description distinction up front, plus the
house idioms that come up repeatedly in code review (no [Theory],
GHAzDO casing, AI ruleId convention, sample-fixture convention,
side-effects-after-detection, internals-via-InternalsVisibleTo).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Port SARIF AI generation guidance from ai-plugins to sarif-sdk (#2930)

Make sarif-sdk the single source of truth for the SARIF spec markdown,
the AI-generated-findings profile, and the agent skills that emit and
validate AI SARIF.

Adds:
- docs/spec/sarif-v2.1.0-spec.md
  Convenience markdown rendering of the OASIS SARIF 2.1.0 specification
  (Plus Errata 01). The OASIS-published document is canonical; IPR notice
  preserved at top of file.
- docs/ai/generating-sarif.md
  Normative guidance for representing AI/LLM-produced security findings
  as first-class SARIF: ai/origin declaration, tool identity, result
  structure, exploitability and attacker-position vocabulary, evidence
  model, redaction, notification taxonomy (AI/EXEC/*, AI/CFG/*), and
  the full AI rule-pack appendix. Includes a Mermaid object-model
  diagram in the appendix.
- docs/ai/example.sarif
  Comprehensive reference SARIF log that conforms to the AI profile.
  Passes `dotnet sarif validate --rule-kind 'Sarif;AI'` cleanly.
- skills/emit-sarif-findings/SKILL.md
  Agent-operating procedure for emitting AI SARIF using the
  Sarif.Multitool emit verbs (emit-init-run, add-result,
  add-notification, emit-finalize --validate). Multitool-only;
  cross-references docs/ai/generating-sarif.md as the normative source.
- skills/validate-sarif-findings/SKILL.md
  Agent-operating procedure for validating AI SARIF. Uses
  `--rule-kind 'Sarif;AI'` against the multitool's AI rule pack
  (AI1003-AI2019) plus the standard SARIF rules in one pass.

Updates:
- README.md adds a short pointer section to the new spec, guidance,
  and skills directories.
- docs/multitool-usage.md gains a 'Modes' table entry for each of the
  new emit verbs (emit-init-run, add-result, add-notification,
  emit-finalize) plus a worked example.

Verification gates run before commit:
- `dotnet sarif validate docs/ai/example.sarif --rule-kind 'Sarif;AI'`
  reports 0 errors.
- End-to-end smoke test (init -> add-result -> finalize --validate)
  produces a SARIF file with 1 result, 1 rule (CWE-78 enriched from
  the embedded MITRE CWE taxonomy).
- All skill command snippets match actual --help output for the
  relevant verb at Sarif.Multitool 5.0.0.

Companion work (separate PR in microsoft/ai-plugins):
- Delete plugins/sarif/ entirely; the canonical home is now this
  repository.
- Retool Swallowtail (and other AI-detector plugins in ai-plugins)
  to invoke Sarif.Multitool emit verbs directly.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add PublishSampleToGhazdo.ps1 + clone-aware CweGenerateSample.ps1 (#2931)

`CweGenerateSample.ps1` now derives `--vcp-repositoryuri` and the
`emit-finalize --srcroot` prefix from `git -C $repoRoot remote get-url
origin`, falling back to `https://github.com/microsoft/sarif-sdk` when
origin is unset. On the canonical microsoft/sarif-sdk clone the generated
fixtures (CweSample.sarif, CweGHAzDoSample.sarif) are byte-identical to
the previous hardcoded form. GitHub origins get a `<repo>/blob/main/`
SRCROOT prefix; other hosts (including ADO) get the bare repo URL with
a trailing slash.

Adds `src/Sarif/Taxonomies/PublishSampleToGhazdo.ps1` -- POSTs a gzipped
SARIF to the GHAzDO SARIFs ingestion endpoint
(`/{org}/{project}/_apis/alert/repositories/{repo}/sarifs?api-version=
7.2-preview.1` on advsec.dev.azure.com, fallback dev.azure.com). Target
org/project/repo are parsed from runs[0].versionControlProvenance[0]
.repositoryUri; PAT is read from the ADO_PAT environment variable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Scrub Microsoft-internal references from AI guidance port (#2932)

Public-OSS hygiene pass on the SARIF AI guidance and skills.
Descriptor ids that are already shipped in the SDK (AI/EXEC/ALAS-SIGNAL
in AI2018.ProvideExecutionSignalArtifact and AI1014's AI/EXEC/* and
AI/CFG/* prefixes) are kept as-is so the docs match the current SDK
implementation.

Changes:
- Drop ALAS expansion and neutralize the signal-payload schema
  (descriptor id kept; no payload schema was ever enforced by the SDK).
- Replace ProjectApi with FastAPI (five sites) in API-handler examples.
- Replace 'Geneva cluster' with 'telemetry cluster' in a deployment
  example.
- Replace example rule id SWT-CPP-001 with ACME-CPP-001.
- Replace author: mikefan with sarif-sdk-maintainers in both skill
  frontmatters.
- Soften a reference to an unpublished companion remediation guidance
  document.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add multitool add-reporting-descriptor verb

Appends a fully-formed SARIF reportingDescriptor JSON object — supplied
via --input <path> or stdin — to the staged event log produced by
emit-init-run.

Two targets:
* Default → run.tool.driver.notifications[]. AI producers routinely emit
  notification descriptors (progress, telemetry, config errors). No id
  convention is enforced; notifications use opaque ids.
* --rules → run.tool.driver.rules[]. Gated against
  AIRuleIdConvention.IsNovel so only NOVEL- novel-finding descriptors
  are accepted. Taxonomy-mapped rule descriptors (e.g., CWE-89) come
  from the taxonomy enricher at finalize time, not from this verb.

Each descriptor id may appear at most once per event log. The verb scans
the existing event log on receipt and rejects duplicates against either
a prior add-reporting-descriptor event of the same target OR a
descriptor pre-populated on the run-header. A --force escape hatch is
acknowledged in error text but intentionally out of v1 scope.

Event-log plumbing:
* Adds SarifEventKinds.RuleDescriptor ("rule-descriptor") and
  SarifEventKinds.NotificationDescriptor ("notification-descriptor"),
  threaded through SarifEventLogReader's kind allow-list.
* SarifEventReplayer buffers descriptor events and merges them into the
  target list BEFORE RegisterDescriptorsFromResults runs. This ordering
  matters: auto-registration synthesizes minimal descriptors only for
  ruleIds that aren't already represented, so an explicit NOVEL-
  descriptor pre-empts the minimal one. Header pre-populated descriptors
  are preserved by reference; the verb's emit-time dedup blocks
  id collisions between header and events.
* New event kinds are additive within CurrentSchemaVersion = 1; older
  readers will skip unknown kinds harmlessly, matching the forward-
  compat shape used when Notification / Invocation kinds were added.

Tests:
* 16 [Fact] tests on AddReportingDescriptorCommand covering both happy
  paths (notifications default, --rules), id validation (missing/empty/
  non-string), the NOVEL gate (taxonomy id rejection on --rules path
  only), rich payload round-trip (messageStrings, defaultConfiguration,
  helpUri, properties — including a date-shaped property string to guard
  against Json.NET DateTime coercion), duplicate detection within and
  across targets, duplicate detection against header-pre-populated
  descriptors for both target arrays, missing-wip-file path, and two
  malformed-input cases (bad JSON, non-object root).
* 3 [Fact] tests on SarifEventReplayer covering: rule-descriptor events
  populating rules and pre-empting auto-registration, notification-
  descriptor events populating notifications, and the
  header-pre-populated + events merge semantics.

No [Theory]/[InlineData] — repeated scenarios use shared private
helpers (SeedRunHeader) per house style.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Strip editorial prefixes from AI notification taxonomy (#2934)

Notification descriptor ids now name the concern only — `DECISION`,
`RULED-OUT`, `DATA-ACCESS-DENIED`, `ALAS-SIGNAL`, `TOOL-UNAVAILABLE`,
etc. The previous `AI/EXEC/*` and `AI/CFG/*` prefixes repeated context
the surrounding SARIF already carries: the array
(`toolExecutionNotifications` vs `toolConfigurationNotifications`)
encodes the kind, and `tool.driver.name` encodes the emitter. The
same id MAY now legally appear in both arrays. Suffixing `EXEC` or
`CFG` on every id is like suffixing `Class` on every C# class — the
surrounding context already says what kind of thing it is.

Placement is selected at authoring time: `add-notification` defaults
to `toolExecutionNotifications`; `add-notification --config` (`-c`)
routes to `toolConfigurationNotifications`. The event-log kind
`SarifEventKinds.Notification` splits into `ExecutionNotification`
(`"execution-notification"`) and `ConfigurationNotification`
(`"configuration-notification"`); the replayer routes each to the
matching invocation array.

`AI1014.ExecutionNotificationPlacement` is deleted. Its sole purpose
was enforcing prefix-vs-array consistency, which is structurally
meaningless under the new convention (the array IS the kind).
`AI2018` retains its semantic; the literal id it checks changes from
`AI/EXEC/ALAS-SIGNAL` to `ALAS-SIGNAL`.

BRK by the letter of v4.6.3 (AI1014 was added there), but AI rules
adoption is low and v5.x is the right place for refinement over
back-compat.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Generalize ALAS-SIGNAL notification id to LEARNING-SIGNAL (#2935)

ALAS named a specific consumer (an internal learning system). Under
the convention shipped in #2934, notification ids name the concern,
not the consumer. LEARNING-SIGNAL describes what the signal is,
independent of who reads it.

While here, rename the AI2018 rule from ProvideExecutionSignalArtifact
to ProvideLearningSignalArtifact for consistency: the class checks
the LEARNING-SIGNAL id, the "Execution" qualifier was redundant under
the new convention (placement is encoded by the array, not the id),
and downstream learning systems aren't necessarily reading only the
execution-side array.

Affects: AI2018 rule class + file + RuleId const + 3 resource
keys/messages, the AI2018 row in docs and skills tables, and the
UNRELEASED BRK bullet for #2934 (whose own "ALAS-SIGNAL example"
becomes "LEARNING-SIGNAL", and which now documents the
class-and-id rename together).

BRK on the just-merged BRK (both still UNRELEASED) — favored over
shipping the consumer name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stamp ReleaseHistory UNRELEASED section as v5.0.0 (#2937)

build.props was already bumped to <VersionPrefix>5.0.0</VersionPrefix>
in #2924 (the SHA-1 BRK). This finishes the v5.0.0 cut by replacing
the UNRELEASED placeholder header in ReleaseHistory.md with the
canonical version banner (Sdk / Driver / Converters / Multitool /
Multitool Library nuget links), matching the v4.6.4 format.

Picked up by #2936 (the dev to main promotion PR).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim and split the over-descriptive v5.0.0 notification-taxonomy BRK (#2938)

Per release-notes house style, bullets are one or two self-contained
sentences; PR-description prose belongs in the PR. The original bullet
was ~3x the length of its neighbors and re-litigated the motivation.

Split into two tighter bullets:

  1. Convention change + routing mechanism (id-prefix strip, new
     --config switch, event-kind split).
  2. Rule-table changes (AI1014 removal, AI2018 rename).

Drops the "prefixes were redundant because..." explanation, the wire
value parentheticals (`"execution-notification"` etc.), and the
"ALAS named a specific consumer" parenthetical. The change itself
is visible in the renames; the reader doesn't need the rationale.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix CweGenerateSample.ps1 -GHAzDO crash inside real ADO pipelines (#2940)

The mseng microsoft.sarif-sdk pipeline broke on the first build of main
after the v5.x promotion (build 31555367). Symptom:

  CweGenerateSample.ps1 (args: -Configuration Release -GHAzDO) exited with code 1.
  ADO pipeline context is partially configured. Either populate every
  required variable or clear them all.
  Problems:
    BUILD_DEFINITIONID='1234' disagrees with SYSTEM_DEFINITIONID='9978'
    (both name the same pipeline identifier and must match)

Root cause: the deterministic-fixture env override in CweGenerateSample.ps1
stamps BUILD_DEFINITIONID=1234 for byte-stable output, but does not also
override SYSTEM_DEFINITIONID. ADO agents inject both. The verb's
must-match cross-check in AdoPipelineContext.TryDetect (correctly) refuses
to proceed when the two disagree.

Fix the script (not the verb): add SYSTEM_DEFINITIONID alongside
BUILD_DEFINITIONID in the \ ordered hashtable, plus
SYSTEM_JOBID / SYSTEM_JOBNAME alongside SYSTEM_PHASEID / SYSTEM_PHASENAME
for symmetric hygiene (those pairs are exempt from must-match but the
default-mode \ cleanup loop iterates \ and
benefits from covering the agent's full fallback set). The fixture SARIF
bytes do not change — the primary env vars were already set and are the
ones the verb actually reads.

Regression gate: new
  CweGHAzDoSample_RegenerationSucceeds_WhenAmbientAdoFallbackEnvVarsConflict
[Fact] explicitly seeds SYSTEM_DEFINITIONID / SYSTEM_JOBID / SYSTEM_JOBNAME
with values that disagree with the script's deterministic primaries
before invoking the script. Without the script fix it fails the same way
the mseng build did; with the fix it passes byte-identical.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Refresh v5.0.0 release-history layout + add prefix legend (#2939)

Three changes, all in ReleaseHistory.md:

1. Add a prefix legend at the top of the file. Codifies the six prefixes
   (DEP / BRK / BUG / NEW / PRF / FUN) and the 'BRK leads each section'
   rule. Footnote notes that older sections may predate the convention.

2. Reorder the v5.0.0 section so all BRK bullets lead (BRK -> NEW -> BUG).
   Pure line shuffling; relative order preserved within each group.

3. Normalize the lone 'BUGFIX:' bullet in v4.6.4 to 'BUG:' (matches the
   legend's canonical form). The deep-history 'BUGFIX, BRK:' entry in
   the v1.x section is left alone — that's immutable shipped state.

No code or schema changes; ReleaseHistory.md only.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace emit-init-run flags with SARIF Run JSON contract

A consumer agent reports the existing 14 typed CLI flags can't express
multiple versionControlProvenance entries (one of which carries a
properties bag documenting skills in play). Modeling every field as a
flag explodes the surface; the peer emit verbs (add-result,
add-notification, add-reporting-descriptor) already accept fully-formed
SARIF JSON via --input or stdin for exactly this reason, with a
documented rationale that applies more strongly to the run header than
to a single result.

Replace the v5.0.0 flag surface on emit-init-run with the same
input/stdin payload contract. EmitInitRunOptions shrinks to three
properties (OutputFilePath, InputFilePath, ForceOverwrite). The
SarifEventReplayer's documented partial-Run shape (tool, language,
columnKind, defaultEncoding, defaultSourceLanguage, originalUriBaseIds,
versionControlProvenance, automationDetails, baselineGuid,
redactionTokens, ...) is now reachable end-to-end through the verb.

Receipt-time validators (no filesystem side-effects on rejection):
required non-empty-string tool.driver.name; https-only
tool.driver.informationUri and versionControlProvenance[].repositoryUri;
https-or-file originalUriBaseIds["SRCROOT"].uri; canonical 8-4-4-4-12
automationDetails.guid/correlationGuid; exact-match ai/origin in
{generated, annotated, synthesized}; SARIF-log-document rejection;
parent-shape JSON-object enforcement at every nested accessor so a
JValue indexer never throws into the broad catch. ADO stamping is now
JToken-direct so producer-supplied SARIF fields outside the SDK typed
Run model survive the wip-line append; the existing typed-Run
materialization at emit-finalize is the documented boundary at which
non-typed fields are dropped, consistent with every other SDK
round-trip.

AdoPipelineContext.ApplyTo(Run) becomes
bool TryApplyTo(Run, out string error). It stamps automationDetails.id
and the four azuredevops/pipeline/build/* properties only when absent
and fails-with-diagnostic on per-field conflict. The previous
unconditional-overwrite contract was inert in v5.0.0 (the flag surface
couldn't supply those fields) but became a footgun once JSON input
could.

CweGenerateSample.ps1 rewrites its emit-init-run call to construct a
PowerShell hashtable -> ConvertTo-Json -Depth 32 -Compress -> stdin
pipe. Both CweSample.sarif and CweGHAzDoSample.sarif regenerate
byte-identically (verified by CweGeneratedSampleTests, which gates the
fixtures sha-256).

skills/emit-sarif-findings/SKILL.md Step 1 is rewritten to show the
JSON construction; the inputs table picks up the multi-VCP and
properties-bag annotations; the package constraint bumps to
Sarif.Multitool >= 5.1.0. docs/multitool-usage.md's flag example is
replaced with the stdin form.

ReleaseHistory.md gets a new v5.1.0 UNRELEASED section with three
bullets: BRK on the flag-surface removal, BRK on
AdoPipelineContext.ApplyTo, NEW on the JSON-payload contract.

Verification:
- dotnet build src/Sarif.Sdk.sln: 0 warnings, 0 errors.
- Test.UnitTests.Sarif.Multitool.Library: 217 passed, 1 skipped.
- Test.UnitTests.Sarif: 896 passed, 3 skipped.
- Test.UnitTests.Sarif.Driver: 140 passed, 1 skipped.
- CweGeneratedSampleTests (3): pass; both fixtures byte-identical.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stamp ReleaseHistory v5.0.1 section with nuget links

src/build.props was bumped to <VersionPrefix>5.0.1</VersionPrefix>
in be6fb70 (the emit-init-run JSON-contract change). This finishes
the v5.0.1 cut by replacing the UNRELEASED placeholder header in
ReleaseHistory.md with the canonical version banner (Sdk / Driver /
Converters / Multitool / Multitool Library nuget links), matching the
v5.0.0 format. Folded into #2942 so main is shippable the moment the
promotion lands.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trim v5.0.1 release-notes bullets to neighbor density

The three bullets were 4-6 sentences each, embedding validator catalogs
and finalize-time round-trip prose that belong in the PR description,
not in ReleaseHistory.md. Repo style explicitly calibrates against the
neighbors and asks for trim/split when a bullet exceeds ~3x — the BRK
and NEW bullets here now sit at roughly the same density as the v5.0.0
rename bullets above them.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add multitool add-invocation verb

Mirrors add-result / add-notification / add-reporting-descriptor: takes a
fully-formed SARIF Invocation JSON object via --input <path> or stdin and
appends it to the staged event log as a SarifEventKinds.Invocation event.

SarifEventReplayer strips run.invocations[] carried on the run header, so
this verb is the only path producers have to populate the array. The verb
imposes no schema beyond must be a JSON object (SARIF makes every field on
Invocation optional); full-log shape validation lives in emit-finalize --validate.

AddInvocationOptions / AddInvocationCommand follow the established pattern.
Program.cs registers and dispatches the new verb. SKILL.md, docs/multitool-usage.md,
and ReleaseHistory.md updated.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Drop unused System.Text using in EmitInitRunCommandTests

CI's BuildAndTest.ps1 invokes dotnet build with --no-incremental and
/p:EnforceCodeStyleInBuild=true, which surfaces IDE0005 (unused using)
as an error. Local Debug + default incremental builds skipped the check
and let the unused System.Text directive ride into be6fb70.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix NOVEL ruleId example and clarify enricher ownership of region.snippet

Two corrections to docs/ai/generating-sarif.md flagged by an external

AI-authoring feedback session that retargeted against Sarif.Multitool@5.0.1:

1. The Novel-findings subsection used the slash form ('NOVEL/<sub-id>' on

   result.ruleId, bare 'NOVEL' on the descriptor) - which AddResultCommand

   and AddReportingDescriptorCommand reject at receipt. The canonical form

   (per docs/AI-RuleId-Convention.md and AIRuleIdConvention.s_novelPrefix)

   is the dash-flat 'NOVEL-<sub-id>'; descriptor.id and result.ruleId are

   byte-identical. The obsolete 'ruleIndex required for NOVEL' paragraph is

   removed - each NOVEL- now has a unique id, so the SARIF 3.19.23 non-

   unique-id workaround no longer applies.

2. The Code Context subsection told AI tools to populate region.snippet

   and contextRegion.snippet on every finding. emit-finalize already runs

   InsertOptionalDataVisitor with RegionSnippets | ContextRegionSnippets |

   ComprehensiveRegionProperties | Hashes, reading the file from disk and

   filling these fields itself. The producer SHOULD emit region.startLine

   and region.endLine; the enricher owns everything else. Pre-populating

   wastes tokens and drift-risks the consumer's view of the file.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Restore RuleKind.Ado as [Obsolete] alias for RuleKind.GHAzDO (#2945)

#2928 renamed RuleKind.Ado to RuleKind.GHAzDO without leaving a
back-compat alias. Restore Ado as an [Obsolete] alias resolving to
the same underlying value (4), so pre-rename source still compiles
and '--rule-kind ado' continues to bind on the multitool CLI via
the existing case-insensitive enum parser. The obsolete-warning
steers new callers off the deprecated spelling without breaking
them. Two new [Fact] tests pin the alias contract (same value;
case-insensitive parse of 'ado' resolves to GHAzDO).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Trigger Validate workflow on dev-targeted pushes and PRs

PRs targeting dev were silently skipping the build-and-test /
check-format / build-multitool-for-npm jobs because the workflow
filter was scoped to main only. Adding dev to both the push and
pull_request branch lists so CI gates dev work the same way it
gates main work.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Consolidate v5.0.1 release-notes for actual ship

v5.0.1 was tagged but never published to NuGet, so the bullet that lived under an unreleased v5.0.2 header folds back into v5.0.1 — that's the version that will actually ship from this PR.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Make autocrlf-sensitive tests deterministic across line-ending configurations (#2859)

On Windows agents with `core.autocrlf=input` (LF on disk, CRLF Environment.NewLine), several tests compared values normalized to `Environment.NewLine` against C# verbatim string literals whose embedded newlines were LF on disk. They passed on default Windows (`autocrlf=true`: CRLF on disk = CRLF NewLine) and on Linux/Mac (`autocrlf=input`: LF on disk = LF NewLine) but failed on the cross-grained combo that no CI configuration exercises.

Two principled moves:

1. Boundary normalization in `TestAssetResourceExtractor.GetResourceText` —
   canonicalize the read text to `Environment.NewLine` (`\r\n` -> `\n` ->
   `Environment.NewLine`). This is the single point where text resources
   enter the test harness, so every consumer (`FileDiffingUnitTests`,
   `InsertOptionalDataVisitorTests`, and ad-hoc callers) inherits the
   normalization for free.

2. Rewrite the affected literal-string assertions to express newlines
   explicitly with `Environment.NewLine` rather than rely on the source
   file's on-disk line endings: `string.Join(Environment.NewLine, ...)`
   for multi-line bodies, `$@"...{Environment.NewLine}..."` for short
   ones. Touches `StackTests`, `WebRequestTests`, `WebResponseTests`,
   `AndroidStudioConverterTests`, `FortifyUtilitiesTests`.

`InsertOptionalDataVisitor.txt` is embedded as a resource and hashed by a
`[Trait(TestTraits.WindowsOnly, "true")]` test where the on-disk hash must
remain stable, so `.gitattributes` pins that file to `eol=crlf`.

Co-authored-by: Michael C. Fanning <mikefan@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Drop AI1015 ProvideRunDefaultSourceLanguage from AI authoring guidance (#2948)

* Drop AI1015 ProvideRunDefaultSourceLanguage from AI authoring guidance

The partition-by-language model proved insufficient as a baseline-updating strategy (the earlier hypothesis that we could replace AI results for one language while retaining results for another on receipt of a new log file). Remove the MUST that an AI run set 'run.defaultSourceLanguage' and partition by '(repository, branch, language)' tuple, along with the 'Run partitioning by language' section in 'docs/ai/generating-sarif.md' and the AI1015 row in the rule table.

'defaultSourceLanguage' remains an accepted optional SARIF Run field for viewer rendering; we simply no longer mandate it. No code references existed for AI1015, so this is a pure doc walkback.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Reclassify AI1015 drop as BUG and trim bullet

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Fix SARIF1012 NRE when result.ruleId does not resolve (#2944) (#2949)

`SARIF1012.MessageArgumentsMustBeConsistentWithRule` previously threw a
`NullReferenceException` when `result.message.id` was set but
`result.ruleId` did not resolve to a rule in `tool.driver.rules[]` (no
match by id, ruleIndex, or hierarchical base). The null-guard at
lines 52-53 inspected `currentRules` (the collection) rather than the
resolved `rule` instance, and used a short-circuiting null-conditional
check that silently fell through to the diagnostic-emit branch's
indexer access.

Replace the guard with an explicit three-prong check:

    rule == null
    || rule.MessageStrings == null
    || !rule.MessageStrings.ContainsKey(result.Message.Id)

All three null cases now emit the existing `MessageIdMustExist`
diagnostic with the unresolved rule id (or `null`).

Regression fixture: adds a 4th result to
`SARIF1012.MessageArgumentsMustBeConsistentWithRule_Invalid.sarif`
that references a non-existent `NoSuchRule` with `message.id: AnyId`,
plus the corresponding expected diagnostic at `startLine: 53`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Replace Generate-CweTaxonomy.ps1 with cross-platform Python regenerator (#2921) (#2950)

The PowerShell-only 'scripts/Generate-CweTaxonomy.ps1' assumed a Windows /
pwsh environment to regenerate the CWE taxonomy assets, which is friction
for non-Windows contributors and the original complaint behind #2921.

Replace it with a single 'scripts/generate_cwe_taxonomy.py' (Python 3,
stdlib only) that runs identically on Linux, macOS, and Windows. The
script:

* Downloads 'cwec_latest.xml.zip' via 'urllib.request' into a temporary
  staging directory.
* Extracts via 'zipfile' and locates the embedded 'cwec_v*.xml'.
* Parses every weakness (handling the CWE XML namespace), buckets by
  Status, sorts by numeric ID, derives the SARIF2012-conformant
  Pascal-case identifier (preferring the parenthesized common name when
  present), resolves the View-1000 ChildOf Primary parent, and emits
  the four-section help markdown (Description / Extended Description /
  Common Consequences / Potential Mitigations).
* Writes 'CweTaxonomy.sarif' and 'CweTaxonomy.brief.md' in place under
  'src/Sarif/Taxonomies/' with UTF-8 (no BOM) and LF line endings so the
  embedded resources hash identically regardless of host OS.

Default invocation requires no arguments and downloads from MITRE:

    python3 scripts/generate_cwe_taxonomy.py

For offline / testing scenarios, '--xml path/to/cwec_v*.xml' bypasses
the download; '--source-url' overrides the MITRE URL; '--output-dir'
overrides the artifact destination.

'.gitattributes' retains the LF pinning on the two generated artifacts.
'src/Sarif/Taxonomies/CweReadme.md' Regeneration section is rewritten
to show the single Python invocation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Narrow SARIF1001 case-fold relaxation to AI-origin notification descriptors (#2951) (#2955)

Initial revision of this fix universally relaxed SARIF1001's id/name comparison
to 'StringComparison.Ordinal' on the grounds that SARIF v2.1.0 § 3.49.7 only
forbids strict-identical pairs. Per maintainer review, that read overshoots:
the case-fold comparison is an authorial SHOULD layered above the spec MUST,
and SHOULD layers are precisely what validators add value with. Removing the
SHOULD globally weakens the typo-catcher for hand-authored descriptors.

The narrower carve-out: AI notification taxonomies (issue #2952) deliberately
pair a SCREAMING-CAPS opaque id with the corresponding PascalCase end-user
name (e.g. 'DECISION' / 'Decision'). That convention is machine-coordinated
and specific to AI emitters; it does not apply to hand-authored rule and
taxon descriptors.

The cut is the intersection of two existing context signals:

  1. 'IsAIOriginRun()' -- the run carries 'properties["ai/origin"]', the
     same gate used by SARIF2002, SARIF2009 (the literal peer rule for
     identifier conventions), SARIF2014, and SARIF2015.

  2. 'Context.CurrentReportingDescriptorKind == Notification' -- the
     descriptor was reached via 'tool.driver.notifications[]', not
     'rules[]' or 'taxa[]'. AI rule ids are constrained by AI1012 to
     'BASE/sub-id' or 'NOVEL-<sub-id>' forms whose hyphens / slashes
     cannot case-fold-collide with any PascalCase name, so extending
     the carve-out to rules would be a no-op for AI and a regression
     for hand-authored taxonomies.

Strict-identical comparisons remain unconditional everywhere (spec MUST).

The Invalid functional fixture now spans three runs covering each
boundary of the carve-out:

  run[0] non-AI         rules        RULE0001/RULE0001   (spec MUST)
                                     RULE0002/RULE0002   (spec MUST)
  run[1] AI-origin      notifications STRICT/STRICT       (spec MUST under AI)
                        rules        DECISION/Decision   (rules-not-exempt)
  run[2] non-AI         notifications DECISION/Decision   (non-AI-not-exempt)

The Valid functional fixture pairs a non-AI tool with hand-authored rules
and an AI-origin tool whose notifications carry 'DECISION/Decision' and
'RULE-COVERAGE-GAP/RuleCoverageGap' -- the latter pair documents the
taxonomy convention even though hyphens already keep it case-fold-distinct.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Add GHAzDO1021 (ProvideShortBranchNameInVcp) (#2954) (#2958)

Add a GHAzDO validation rule for run.versionControlProvenance[].branch values that start with refs/<class>/, using ^refs/[^/]+/ to detect full-ref branch names and recommending the stripped short form.

The AdvSec Service silently drops VCP entries whose branch is not a short branch name, so the rule is scoped only to versionControlProvenance[].branch. It deliberately does not flag the full-ref branch segment embedded in run.automationDetails.id, which remains part of the existing GHAzDO1020 contract.

Add valid and invalid ValidateCommand fixtures covering short branch names and refs/heads, refs/tags, and refs/pull full refs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Enrich versionControlProvenance from CI pipeline env (ADO + GitHub Actions) (#2957) (#2959)

EmitInitRunCommand currently stamps automationDetails.id and the four
`azuredevops/pipeline/build/*` property keys when `TF_BUILD=True` is
detected, but it does not lift anything into `versionControlProvenance`
- even though `BUILD_SOURCEBRANCH` is already in hand. AdvSec ingestion
relies on VCP for branch + revision attribution, and this gap means the
data lands only when a producer hand-constructs the entry into the
run-header JSON. The same gap exists for AI scanners running under
GitHub Actions.

Extend AdoPipelineContext to read the two optional argument vars
`BUILD_REPOSITORY_URI` and `BUILD_SOURCEVERSION`, and derive a short
branch name from the existing `BUILD_SOURCEBRANCH` by stripping any
leading `refs/<class>/` segment (so `refs/heads/main` becomes
`main`; `refs/pull/42/merge` becomes `42/merge`; `refs/tags/v1`
becomes `v1`). The two new env vars are optional - absence does not
degrade Complete -> Partial - but malformed presence does (URI must be
absolute http(s); revision must match `^[0-9a-fA-F]{7,40}$`).

Add a parallel `GitHubActionsContext` for `GITHUB_ACTIONS=true`
that mirrors the same shape but is VCP-scoped (no pipeline-identity
contract today): `GITHUB_SERVER_URL` + `GITHUB_REPOSITORY` compose
the repository URI, `GITHUB_SHA` supplies the revision (same hex
regex), and `GITHUB_REF_NAME` is preferred over `GITHUB_REF`
(stripping the same `refs/<class>/` prefix). Custom GHES servers
compose correctly via trailing-slash normalization.

EmitInitRunCommand grows a `TryStampVcp` JObject-direct stamper that
mirrors the existing `TryStampAdoContext` shape and operates on three
input shapes:

1. `versionControlProvenance` absent or empty array -> synthesize a
   new entry only when `repositoryUri` was detected (anchor field).
   Branch/revision without a repository URI is informationally thin
   and the synthesized entry would not bind to a repo for ingestion.
2. `versionControlProvenance` contains exactly one entry -> enrich
   missing fields; fail with a per-field conflict diagnostic when any
   supplied field disagrees with the detected pipeline value. Repository
   URI equality treats scheme/host case-insensitively (RFC 3986) via
   `Uri.TryCreate` round-trip; branch and revision are byte-wise.
3. `versionControlProvenance` contains multiple entries -> leave
   untouched. The caller has declared a multi-repo shape and we refuse
   to guess which entry names the pipeline's source repo.

A `TryResolveVcpFields` orchestrator layers the two sources before
stamping: ADO is the higher-priority source per the documented
"env-takes-priority" rule, GHA fills any gap where ADO is silent, and
fields populated on both sources MUST agree or the verb aborts with a
diagnostic naming both sources. The stamper itself is source-agnostic -
it takes the resolved (repositoryUri, revisionId, branch) triple and
does not know which env produced each field.

Probe-before-write semantics on the single-entry path leave the JObject
unchanged when a conflict is detected, matching the existing
`TryStampAdoContext` contract - a half-stamped VCP is worse than a
clean refusal.

Why no disk-git fallback?
The verb deliberately does not shell out to `git.exe` to recover this
data from the working tree. The two CI envs cover both surfaces an AI
scanner lands in, `add-result`'s producer-supplied JSON is the
universal escape hatch for everything else, and adding a soft runtime
dependency on `git.exe` (with its own failure modes around shallow
clones, detached HEADs, and non-existent branches) would make the
verb's output depend on disk state. Producers running locally outside
CI either set the env vars themselves or populate VCP directly in the
input JSON.

CI fixture isolation
`CweGenerateSample.ps1`'s `\` map gains the two new optional
ADO vars (`BUILD_REPOSITORY_URI` set to the resolved git remote URL;
`BUILD_SOURCEVERSION` set to the same zero-SHA placeholder the
hardcoded VCP entry carries) so the -GHAzDO variant stamps a fully
populated ADO env shape and AdoPipelineContext detects no conflict
against the supplied VCP. The same map also adds `\` entries for
`GITHUB_ACTIONS` / `GITHUB_SERVER_URL` / `GITHUB_REPOSITORY` /
`GITHUB_SHA` / `GITHUB_REF_NAME` / `GITHUB_REF` so that ambient
GitHub Actions env on the macOS CI runner (which sets a real
`GITHUB_SHA`) cannot trip GitHubActionsContext into reporting a
revisionId that conflicts with the zero-SHA placeholder. Without this
scrubbing, both `CweSample_Sarif_IsByteIdenticalToCweGenerateSampleScriptOutput`
and `CweGHAzDoSample_Sarif_IsByteIdenticalToCweGenerateSampleScriptOutput`
break under macos-latest. The same property is now gated by
`CweGHAzDoSample_RegenerationSucceeds_WhenAmbientGitHubActionsEnvVarsConflict`,
the GHA-side parallel of the existing ambient-ADO regression test.

Closes #2957.

Tests: 7 new EmitInitRunCommandTests covering GHA-only stamping, ADO+GHA
agreement, the gap-fill path, cross-source disagreement on each field,
GHA partial-env refusal, and producer-supplied conflicts under GHA. 11
new GitHubActionsContextTests covering detection states, malformed
inputs, REF_NAME vs REF precedence, and URI normalization. 1 new
CweGeneratedSampleTests fact gating ambient-GHA-env isolation. The
original 11 ADO VCP EmitInitRunCommandTests and 9 AdoPipelineContextTests
still green. 263/264 in Test.UnitTests.Sarif.Multitool.Library and
4/4 CweGeneratedSampleTests pass locally.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Stamp ReleaseHistory v5.0.2 section with nuget links and bump VersionPrefix (#2960)

Promotes the v5.0.2 work to ship-ready:

- `src/build.props`: `VersionPrefix` 5.0.1 -> 5.0.2,
  `PreviousVersionPrefix` 5.0.0 -> 5.0.1.
- `ReleaseHistory.md`: stamp the `v5.0.2` header with nuget links
  for Sdk / Driver / Converters / Multitool / Multitool Library
  (UNRELEASED -> shipped).
- `skills/emit-sarif-findings/SKILL.md`: bump the recommended
  Sarif.Multitool minimum from 5.0.1 to 5.0.2. v5.0.2 is the first
  release where `emit-init-run` enriches versionControlProvenance
  from the CI pipeline environment (Azure DevOps + GitHub Actions),
  which the skill's required commit-sha / branch / repo-uri inputs
  are stamped from automatically.

Six v5.0.2 bullets ship:
* BRK: scripts/Generate-CweTaxonomy.ps1 -> scripts/generate_cwe_taxonomy.py
  (#2921 / #2950).
* NEW: `emit-init-run` enriches `versionControlProvenance` from
  ADO + GitHub Actions env (#2957 / #2959).
* NEW: GHAzDO1021 `ProvideShortBranchNameInVcp` (#2954 / #2958).
* BUG: Drop AI1015 `ProvideRunDefaultSourceLanguage` (#2948).
* BUG: SARIF1012 NRE on unresolved ruleId (#2944 / #2949).
* BUG: SARIF1001 case-fold relaxation for AI notification descriptors
  (#2951 / #2955).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Revert GHAzDO1021 + short-form branch normalization (false premise) (#2962)

The GHAzDO1021 rule (``ProvideShortBranchNameInVcp``) and the related
short-form branch-name normalization in the ADO and GHA env enrichers
were built on a false premise. The original observation -- that the
GHAzDO/AdvSec ingestion service silently dropped ``run.versionControlProvenance[]``
entries whose ``branch`` started with ``refs/<class>/`` -- turned out
to be job-processing latency misread as validation failure. A product
engineer from the AdvSec team has since confirmed that ingestion
accepts both short (``main``) and long (``refs/heads/main``) shapes,
so there is nothing to warn about and nothing to normalize.

This change reverts the lot before v5.0.2 ships:

* Delete ``GHAzDO1021.ProvideShortBranchNameInVcp`` rule plus its
  four Valid/Invalid Inputs and ExpectedOutputs fixtures, the resx +
  Designer entries, and the ``RuleId`` constant.
* ``AdoPipelineContext``: drop ``s_branchRefPrefixRegex``,
  ``BranchShortName``, and ``NormalizeBranchRef``. ``TryDetect``
  passes ``BUILD_SOURCEBRANCH`` through verbatim; ``BranchRef`` is
  the sole branch property, used directly when stamping VCP.
* ``GitHubActionsContext``: drop the ``GITHUB_REF_NAME`` fallback
  entirely (the runner always sets both env vars, so this is invisible
  in production but keeps the property honestly long-form). Rename
  ``BranchShortName`` -> ``BranchRef``; new ``TryReadOptionalBranchRef``
  is a pass-through.
* ``EmitInitRunCommand``: rename ``vcpBranchShortName`` -> ``vcpBranch``;
  ``TryResolveVcpFields`` / ``TryStampVcp`` parameter renames; doc
  comments updated.
* Tests: ``ValidateCommandTests`` drops the two GHAzDO1021_* methods;
  ``AdoPipelineContextTests`` renames the four "BranchShortName_Strips*"
  tests to "Passes*Through" and asserts only ``BranchRef`` (long form);
  ``GitHubActionsContextTests`` switches to ``GITHUB_REF`` setup, drops
  the two RefName-preference tests in favour of three pass-through
  tests; ``EmitInitRunCommandTests`` updates four env setups and six
  assertions to the long form.
* ``CweGenerateSample.ps1`` supplies ``branch = 'refs/heads/main'`` so
  the GHAzDO-variant sample agrees with the env-derived long form on
  cross-source check; ``CweSample.sarif`` and ``CweGHAzDoSample.sarif``
  regenerated.
* ``ReleaseHistory.md`` v5.0.2 UNRELEASED: drop the GHAzDO1021 NEW
  bullet; soften the VCP-enrichment bullet wording to reflect
  pass-through semantics (no short-form derivation).

No version bump: v5.0.2 is unreleased (the dev->main promote PR is
open and blocked); this lands as part of the same release window.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Derek Morris <Penguinwizzard@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant