Batch small Helix work items to reduce per-item overhead (-53% compute)#66808
Draft
mmitche wants to merge 4 commits into
Draft
Batch small Helix work items to reduce per-item overhead (-53% compute)#66808mmitche wants to merge 4 commits into
mmitche wants to merge 4 commits into
Conversation
Reduces ~503 work items per build to ~50 by batching compatible small test assemblies (those without special dependencies like IIS/Playwright) into groups of up to 20. Each batched work item runs dotnet test sequentially for each assembly, sharing the per-item setup overhead (tool installs, env config, vstest launcher) across the batch. Expected impact: ~60-70% reduction in total compute per build. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The batch task was using non-existent metadata fields (RuntimeVersion, QueueName, etc.) resulting in empty command arguments. Ubuntu items were submitted with empty runtime/queue args and never picked up by agents. Now parses these values from the first item's existing Command string. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When one assembly in a batch fails, the entire batch exited with code 1, causing the Helix SDK to treat it as a failed work item and not report test results from the other 19 passing assemblies to AzDO. This caused ~1300 missing tests in the test count. Now batched runs always exit 0 so all results are reported. Individual test failures are visible through the test results XML. This also prevents wasteful retries of entire 20-assembly batches for a single flaky test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
Hey @dotnet/aspnet-build, looks like this PR is something you want to take a look at. |
Member
Author
|
@wtgodbe THis is an attempt to reduce the test overhead in aspnetcore. It is NOT ready (prototype included some hacks for testing) |
Member
|
Love this idea! Will take a closer look next week, but worth noting that the helix tests are currently broken into 2 subsets: Lines 236 to 297 in 27c660e aspnetcore/.azure/pipelines/ci-public.yml Lines 562 to 591 in 27c660e |
Open
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Batch small Helix work items to reduce per-item overhead
Problem
Each ASP.NET Core CI build sends ~503 Helix work items, but most have only 2-10 seconds of actual test execution with ~20-40 seconds of per-item overhead (tool installs, vstest startup, result upload). This wastes ~5.6 compute-hours per build.
Solution
Batch compatible small test assemblies into groups of ~20 per Helix work item. Assemblies with special dependencies (IIS, Playwright, Java, Node, MSSQL) remain as individual items.
Measured Results
Per-queue breakdown:
How it works
eng/helix/helix.proj- newBatchSmallWorkItemsMSBuild inline task runs after the existingGathertarget. Groups eligible items by TFM, creates combined payload directories, writes atargets.txtmanifest.eng/tools/HelixTestRunner- accepts--targets-file targets.txtto rundotnet testsequentially for each assembly. Tool installs happen once per batch. Test results are merged.eng/helix/content/runtests.cmd/runtests.sh- detect@targets.txtprefix for batched mode. Fully backward compatible.Batching rules
PreCommands(no IIS, Playwright, Java, Node, MSSQL deps)CI validation