JobMonitor: emit in-progress AzDO + unfinished Helix details on timeout/cancellation#16878
JobMonitor: emit in-progress AzDO + unfinished Helix details on timeout/cancellation#16878Copilot wants to merge 5 commits into
Conversation
Agent-Logs-Url: https://github.com/dotnet/arcade/sessions/c103e3cf-afe4-44dc-8ce7-56e30b442048 Co-authored-by: mmitche <8725170+mmitche@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/arcade/sessions/c103e3cf-afe4-44dc-8ce7-56e30b442048 Co-authored-by: mmitche <8725170+mmitche@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Improves JobMonitor timeout/cancellation diagnostics so logs explicitly call out (1) unfinished/unprocessed Helix job attempts and (2) non-monitor Azure DevOps pipeline jobs still queued/in progress at the time the monitor exits, with a regression test asserting both log surfaces.
Changes:
- Track the latest stage-filtered AzDO timeline records during polling and emit in-progress/queued non-monitor job details on timeout/cancellation.
- Enhance timeout reporting to list unfinished/unprocessed Helix attempts with additional per-job context (status, initial work item count, details URL).
- Extend
MonitorTimesOut_CancelsLatestInFlightHelixJobsto assert both the Helix and AzDO timeout log outputs.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Microsoft.DotNet.Helix/JobMonitor/JobMonitorRunner.cs | Captures latest timeline snapshot during polling and expands timeout logging for both Helix attempts and AzDO pipeline job state. |
| src/Microsoft.DotNet.Helix/Sdk.Tests/Microsoft.DotNet.Helix.Sdk.Tests/JobMonitorRunnerTests.cs | Updates timeout-cancellation test to use a recording logger and assert the new timeout diagnostics. |
Comments suppressed due to low confidence (1)
src/Microsoft.DotNet.Helix/JobMonitor/JobMonitorRunner.cs:1006
- This LogError message template is constructed via string interpolation to include Environment.NewLine. To match the rest of JobMonitorRunner (which uses
{nl}placeholders) and keep the template constant/structured, use{nl}in the template and pass Environment.NewLine as arguments instead of embedding it into the string.
_logger.LogError(
$"At timeout, {{InProgressCount}} non-monitor Azure DevOps pipeline job(s) were still in progress or queued:{Environment.NewLine}- {{InProgressJobs}}{Environment.NewLine}",
inProgressPipelineJobs.Count,
string.Join(Environment.NewLine + "- ", inProgressPipelineJobs.Select(FormatInProgressPipelineJobForTimeoutLog)));
|
@copilot resolve the merge conflicts in this pull request |
Resolved. I merged |
When Job Monitor exits due to timeout/cancellation, logs only partially explained what was left. This change makes timeout diagnostics explicit for both Helix and non-monitor AzDO jobs so the remaining/running work is visible in the AzDO timeline log.
Capture latest stage-filtered timeline records during polling and, on cancellation, log non-monitor jobs still not completed (
pending/inProgress), includingstateandresult.Timeout reporting now lists latest unfinished/unprocessed Helix attempts with display name, status, initial work item count, and details URL.
Ensure in-progress AzDO jobs are sorted safely even when timeline
Nameis null by falling back toReferenceName/Identifier, preventing timeout reporting from throwing on multi-job comparisons.MonitorTimesOut_CancelsLatestInFlightHelixJobsasserts both new timeout log surfaces (unfinished Helix jobs and in-progress AzDO jobs), and merge conflict resolution preserved the async cancellation/upload synchronization behavior frommain.To double check: