Skip to content

ci(notify): file/update ci-broken GitHub issue when internal AzDO build breaks on main/release/*#17920

Open
radical wants to merge 4 commits into
microsoft:mainfrom
radical:radical/notify-internal-build-failure
Open

ci(notify): file/update ci-broken GitHub issue when internal AzDO build breaks on main/release/*#17920
radical wants to merge 4 commits into
microsoft:mainfrom
radical:radical/notify-internal-build-failure

Conversation

@radical

@radical radical commented Jun 4, 2026

Copy link
Copy Markdown
Member

Internal AzDO build failures on main and release/* had no automated visibility on the public tracker — a broken build could sit unnoticed until someone happened to look at the AzDO build list.

What it does

On every non-PR build of main or release/*:

  • One or more stages Failed → file (or update) a ci-broken issue on microsoft/aspire, titled Internal build broken on <branch>, assigned to @joperezr + @radical. The body carries a managed failures table that grows a row per subsequent failure on the same branch (build link, commit, failed stages). A follow-up comment fires on each failure so @-mentions notify.
  • All stages Succeeded → close the open ci-broken issue for that branch with a green-build comment.

One open issue per branch, deduplicated by a hidden HTML-comment marker in the body. Full contract (labels, marker syntax, dedupe behavior, dry-run, manual cleanup) in docs/ci/internal-build-failure-notifications.md.

How

Notify-GitHubOnBuildResult.ps1 invoked from two new pipeline stages (notify_failure, notify_success), authenticated via an aspire-repo-bot GitHub App installation token, driving the gh CLI. Always exits 0 — a flaky notification path must never red an otherwise-correct build.

A notifyOnFailureDryRun queue-time parameter logs the would-be gh calls without mutating GitHub.

Validated on AzDO

  • Dry-run: build 2992742 — pwsh + script + -FailedStages on Linux, exits 0 cleanly.
  • Live-mode: build 2992761 — filed #17938 on a throwaway release/aspire-internal-notify-validation marker, then closed it in the same build via gh issue close.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 17920

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 17920"

@radical radical force-pushed the radical/notify-internal-build-failure branch from 0e6133b to fe60937 Compare June 4, 2026 22:59
@radical radical closed this Jun 4, 2026
@radical radical reopened this Jun 4, 2026
@microsoft-github-policy-service microsoft-github-policy-service Bot added this to the 13.5 milestone Jun 4, 2026
@radical radical closed this Jun 4, 2026
@radical radical reopened this Jun 4, 2026
@radical radical changed the title ci: file GitHub issue when internal AzDO build breaks on main/release ci(notify): file/update ci-broken GitHub issue when internal AzDO build breaks on main/release/* Jun 5, 2026
@radical radical marked this pull request as ready for review June 5, 2026 05:23
Copilot AI review requested due to automatic review settings June 5, 2026 05:23
radical and others added 3 commits June 5, 2026 01:26
PowerShell helper that files / updates / closes a ci-broken GitHub issue
on microsoft/aspire for a given branch. Two modes:

- Failure: GETs open ci-broken issues, filters by a hidden HTML-comment
  marker (<!-- aspire-internal-build-broken:<branch> -->), and either
  creates a new issue (with a managed failures-table region in the body)
  or appends a row to the existing one and posts a follow-up @-mention
  comment.
- Success: closes the matching open issue with a green-build comment.

Drives the gh CLI throughout, authenticated via $env:GH_TOKEN that the
caller sets after minting an aspire-repo-bot installation token. List
uses the strongly-consistent /issues endpoint, not /search/issues, so
near-simultaneous failures don't each file a duplicate; a post-create
re-list catches the rare race past that window and closes ours as a
duplicate of the older.

Always exits 0. Warnings surface via task.logissue + task.complete
result=SucceededWithIssues so a silent regression (bot loses permission,
label deleted, gh API shape changes) goes yellow rather than green.

-DryRun logs the would-be gh calls without mutating GitHub. In dry-run
the script skips token mint entirely so the wrapper can validate
pipeline plumbing without resolving aspire-repo-bot credentials.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ipeline

Adds two stages to azure-pipelines.yml that gate on the upstream
build_sign_native / build / prepare_installers stage results and run
Notify-GitHubOnBuildResult.ps1 on 1es-ubuntu-2204:

- notify_failure: fires when at least one upstream stage Failed.
  Composes a comma-separated -FailedStages list from
  dependencies.<stage>.result so the filed / updated issue body
  identifies which stage broke.
- notify_success: fires when all three upstream stages Succeeded
  / SucceededWithIssues (prepare_installers may legitimately Skip
  on stable GA release builds). Closes the open ci-broken issue
  for the branch.

Both stages mint the aspire-repo-bot installation token via
Get-AspireBotInstallationToken.ps1 and export it as GH_TOKEN so the
gh CLI invocations in the script authenticate as the bot.

Adds _IsNotificationBranch in common-variables.yml — exact match
on refs/heads/main (NOT startsWith, the pipeline trigger's `main*`
wildcard would otherwise sweep in branches like main-something) plus
startsWith refs/heads/release/. Excludes internal/release/* so
internal branch names don't leak into the public tracker.

Aspire-Release-Secrets variable group is imported at pipeline scope
with the same non-PR + main/release/* gate, so manual feature-branch
and PR runs don't pay the variable-group auth check at queue time.

A notifyOnFailureDryRun queue-time parameter logs would-be gh calls
without mutating GitHub; applies to both stages so a green-build
dry-run can't accidentally close real open issues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds docs/ci/internal-build-failure-notifications.md describing the
contract (labels, marker syntax, failures-table region, dedupe
behavior, dry-run, manual cleanup) and pointer from
eng/pipelines/README.md so anyone reading the pipeline docs lands
on the notification system.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical force-pushed the radical/notify-internal-build-failure branch from b3883d4 to 7870f74 Compare June 5, 2026 05:26
@radical radical marked this pull request as draft June 5, 2026 05:27
@radical radical marked this pull request as ready for review June 5, 2026 05:29
@radical radical added the area-engineering-systems infrastructure helix infra engineering repo stuff label Jun 5, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds automated public GitHub issue visibility for internal AzDO build breaks on main and release/*, so internal failures don’t remain unnoticed without someone manually checking AzDO.

Changes:

  • Introduces a PowerShell notifier that files/updates a branch-deduped ci-broken issue on failures and closes it on the next green build.
  • Adds notify_failure / notify_success stages to the internal pipeline, plus a queue-time dry-run parameter and branch gating.
  • Documents the notification contract and links it from pipeline docs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
eng/pipelines/scripts/Notify-GitHubOnBuildResult.ps1 New notifier script that uses gh + an installation token to create/update/close ci-broken issues per branch.
eng/pipelines/README.md Adds a short pointer describing the internal build-result notification behavior and links to full docs.
eng/pipelines/common-variables.yml Adds _IsNotificationBranch to gate notifications to main/release/* (excluding internal/release/*).
eng/pipelines/azure-pipelines.yml Adds dry-run parameter, imports secrets conditionally, and adds notify_failure/notify_success stages.
docs/ci/internal-build-failure-notifications.md New contract documentation for behavior, dedupe strategy, and operational expectations.

Comment thread eng/pipelines/scripts/Notify-GitHubOnBuildResult.ps1 Outdated
Comment thread eng/pipelines/azure-pipelines.yml
Comment thread eng/pipelines/azure-pipelines.yml Outdated
Comment thread docs/ci/internal-build-failure-notifications.md Outdated
Comment thread docs/ci/internal-build-failure-notifications.md Outdated
- Notify-GitHubOnBuildResult.ps1: register the GitHub App installation
  token via `##vso[task.setsecret]` instead of `##vso[task.setvariable
  ...;issecret=true]`. The only purpose was log masking, but
  task.setvariable also persists the value as a job-scoped variable
  that other tasks could accidentally reference via $(__notifyGhToken).
  task.setsecret gives us the masking without the persistence.

- azure-pipelines.yml + docs: dry-run mode logs the `gh` CLI commands
  it would run, not GitHub REST calls. Fix the parameter displayName,
  the parameter comment, and the corresponding docs paragraph.

- docs: Azure Pipelines stage results use `Skipped`, not `Skip` (the
  YAML stage condition correctly checks for 'Skipped'). Fix the prose.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@radical radical requested review from adamint and sebastienros June 8, 2026 17:12
@radical radical mentioned this pull request Jun 9, 2026
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-engineering-systems infrastructure helix infra engineering repo stuff

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants