run-sweep: gate full-sweep PRs behind a sequential canary#1503
Open
Oseltamivir wants to merge 1 commit into
Open
run-sweep: gate full-sweep PRs behind a sequential canary#1503Oseltamivir wants to merge 1 commit into
Oseltamivir wants to merge 1 commit into
Conversation
Comment on lines
+182
to
+210
| needs: canary-select | ||
| if: ${{ needs.canary-select.outputs.canary-config != '' && needs.canary-select.outputs.canary-config != '[]' }} | ||
| uses: ./.github/workflows/benchmark-tmpl.yml | ||
| name: canary / | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| config: ${{ fromJson(needs.canary-select.outputs.canary-config) }} | ||
| secrets: inherit | ||
| with: | ||
| exp-name: ${{ matrix.config.exp-name }} | ||
| isl: ${{ matrix.config.isl }} | ||
| osl: ${{ matrix.config.osl }} | ||
| max-model-len: ${{ matrix.config.max-model-len }} | ||
| runner: ${{ matrix.config.runner }} | ||
| image: ${{ matrix.config.image }} | ||
| model: ${{ matrix.config.model }} | ||
| model-prefix: ${{ matrix.config.model-prefix }} | ||
| framework: ${{ matrix.config.framework }} | ||
| precision: ${{ matrix.config.precision }} | ||
| tp: ${{ matrix.config.tp }} | ||
| ep: ${{ matrix.config.ep }} | ||
| dp-attn: ${{ matrix.config.dp-attn }} | ||
| conc: ${{ matrix.config.conc }} | ||
| spec-decoding: ${{ matrix.config.spec-decoding }} | ||
| disagg: ${{ matrix.config.disagg }} | ||
| run-eval: false | ||
|
|
||
| sweep-multi-node-1k1k: |
2 tasks
When a PR carries `full-sweep-enabled` (and not `evals-only`), pick the lowest-conc single-node benchmark entry as a canary and run it before fanning out the full sweep. If the canary fails, the eight fan-out jobs are skipped to save cluster time on shared failures (bad image tag, removed CLI flag, etc.). Design choices: - Canary candidacy is restricted to single_node['1k1k' | '8k1k'] and excludes entries with run-eval: true, so the canary is always a pure benchmark smoke test using the existing single-node template. - The canary entry is removed from the regular fan-out's matrix (via remaining-search-space-config) only when the canary actually succeeded. On canary skip / cancel / canary-select failure, the regular fan-out falls back to the full search-space-config so coverage is preserved. - The fan-out gate blocks only on `canary-sweep.result == 'failure'` -- every other state (success, skipped, cancelled) proceeds, so a bug in the canary mechanism never blocks the rest of the sweep. - Non-full-sweep PRs, draft PRs, pushes to main, and the reuse path all behave identically to before via existing gates. The aggregated results_bmk artifact picks up both the canary's row and the regular fan-out's rows via the existing bmk_* glob -- each entry appears exactly once.
28ecd9a to
78b51e3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When a PR carries
full-sweep-enabled(and notevals-only), pick the lowest-conc single-node benchmark entry as a canary and run it before fanning out the full sweep. If the canary fails, the eight fan-out jobs are skipped to save cluster time on shared failures (bad image tag, removed CLI flag, etc.).Design choices:
single_node['1k1k' | '8k1k']and excludes entries withrun-eval: true, so the canary is always a pure benchmark smoke test using the existing single-node template.remaining-search-space-config) only when the canary actually succeeded. On canary skip / cancel / canary-select failure, the regular fan-out falls back to the full search-space-config so coverage is preserved.canary-sweep.result == 'failure'— every other state (success, skipped, cancelled) proceeds, so a bug in the canary mechanism never blocks the rest of the sweep.The aggregated
results_bmkartifact picks up both the canary's row and the regular fan-out's rows via the existingbmk_*glob — each entry appears exactly once.Docker-tag-monitor research block split out to #1538.
Test plan