Skip to content

JobMonitor: emit in-progress AzDO + unfinished Helix details on timeout/cancellation#16878

Open
Copilot wants to merge 5 commits into
mainfrom
copilot/job-monitor-cancellation-info
Open

JobMonitor: emit in-progress AzDO + unfinished Helix details on timeout/cancellation#16878
Copilot wants to merge 5 commits into
mainfrom
copilot/job-monitor-cancellation-info

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented May 23, 2026

When Job Monitor exits due to timeout/cancellation, logs only partially explained what was left. This change makes timeout diagnostics explicit for both Helix and non-monitor AzDO jobs so the remaining/running work is visible in the AzDO timeline log.

  • Description
    • Timeout diagnostics now include AzDO job state
      Capture latest stage-filtered timeline records during polling and, on cancellation, log non-monitor jobs still not completed (pending/inProgress), including state and result.
    • Helix timeout output is more actionable
      Timeout reporting now lists latest unfinished/unprocessed Helix attempts with display name, status, initial work item count, and details URL.
    • Null-safe ordering for AzDO timeout output
      Ensure in-progress AzDO jobs are sorted safely even when timeline Name is null by falling back to ReferenceName/Identifier, preventing timeout reporting from throwing on multi-job comparisons.
    • Focused regression coverage
      MonitorTimesOut_CancelsLatestInFlightHelixJobs asserts both new timeout log surfaces (unfinished Helix jobs and in-progress AzDO jobs), and merge conflict resolution preserved the async cancellation/upload synchronization behavior from main.
var inProgressPipelineJobs = GetInProgressNonMonitorPipelineJobs(latestTimelineRecords, _options.JobMonitorName)
    .OrderBy(r => r.Name ?? r.ReferenceName ?? r.Identifier ?? string.Empty, StringComparer.OrdinalIgnoreCase)
    .ToList();

To double check:

Copilot AI requested review from Copilot and removed request for Copilot May 23, 2026 01:43
Copilot AI requested review from Copilot and removed request for Copilot May 23, 2026 01:48
Copilot AI requested review from Copilot and removed request for Copilot May 23, 2026 01:52
Copilot AI changed the title [WIP] Add job monitor logging for in-progress jobs on cancellation JobMonitor: emit in-progress AzDO + unfinished Helix details on timeout/cancellation May 23, 2026
Copilot AI requested a review from mmitche May 23, 2026 01:53
premun
premun previously approved these changes May 25, 2026
@premun premun marked this pull request as ready for review May 25, 2026 09:05
Copilot AI review requested due to automatic review settings May 25, 2026 09:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves JobMonitor timeout/cancellation diagnostics so logs explicitly call out (1) unfinished/unprocessed Helix job attempts and (2) non-monitor Azure DevOps pipeline jobs still queued/in progress at the time the monitor exits, with a regression test asserting both log surfaces.

Changes:

  • Track the latest stage-filtered AzDO timeline records during polling and emit in-progress/queued non-monitor job details on timeout/cancellation.
  • Enhance timeout reporting to list unfinished/unprocessed Helix attempts with additional per-job context (status, initial work item count, details URL).
  • Extend MonitorTimesOut_CancelsLatestInFlightHelixJobs to assert both the Helix and AzDO timeout log outputs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/Microsoft.DotNet.Helix/JobMonitor/JobMonitorRunner.cs Captures latest timeline snapshot during polling and expands timeout logging for both Helix attempts and AzDO pipeline job state.
src/Microsoft.DotNet.Helix/Sdk.Tests/Microsoft.DotNet.Helix.Sdk.Tests/JobMonitorRunnerTests.cs Updates timeout-cancellation test to use a recording logger and assert the new timeout diagnostics.
Comments suppressed due to low confidence (1)

src/Microsoft.DotNet.Helix/JobMonitor/JobMonitorRunner.cs:1006

  • This LogError message template is constructed via string interpolation to include Environment.NewLine. To match the rest of JobMonitorRunner (which uses {nl} placeholders) and keep the template constant/structured, use {nl} in the template and pass Environment.NewLine as arguments instead of embedding it into the string.
                _logger.LogError(
                    $"At timeout, {{InProgressCount}} non-monitor Azure DevOps pipeline job(s) were still in progress or queued:{Environment.NewLine}- {{InProgressJobs}}{Environment.NewLine}",
                    inProgressPipelineJobs.Count,
                    string.Join(Environment.NewLine + "- ", inProgressPipelineJobs.Select(FormatInProgressPipelineJobForTimeoutLog)));

Comment thread src/Microsoft.DotNet.Helix/JobMonitor/JobMonitorRunner.cs
@mmitche
Copy link
Copy Markdown
Member

mmitche commented May 26, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor Author

Copilot AI commented May 26, 2026

@copilot resolve the merge conflicts in this pull request

Resolved. I merged origin/main, fixed the conflict in src/Microsoft.DotNet.Helix/Sdk.Tests/Microsoft.DotNet.Helix.Sdk.Tests/JobMonitorRunnerTests.cs, and pushed merge commit 53bfeb5.

Copilot AI requested review from Copilot and removed request for Copilot May 26, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Job Monitor should print information about in-progress jobs on cancellation/exit

4 participants