Skip to content

run-sweep: gate full-sweep PRs behind a sequential canary#1503

Open
Oseltamivir wants to merge 1 commit into
mainfrom
sweep-canary-gate
Open

run-sweep: gate full-sweep PRs behind a sequential canary#1503
Oseltamivir wants to merge 1 commit into
mainfrom
sweep-canary-gate

Conversation

@Oseltamivir
Copy link
Copy Markdown
Collaborator

@Oseltamivir Oseltamivir commented May 18, 2026

Summary

When a PR carries full-sweep-enabled (and not evals-only), pick the lowest-conc single-node benchmark entry as a canary and run it before fanning out the full sweep. If the canary fails, the eight fan-out jobs are skipped to save cluster time on shared failures (bad image tag, removed CLI flag, etc.).

Design choices:

  • Canary candidacy is restricted to single_node['1k1k' | '8k1k'] and excludes entries with run-eval: true, so the canary is always a pure benchmark smoke test using the existing single-node template.
  • The canary entry is removed from the regular fan-out's matrix (via remaining-search-space-config) only when the canary actually succeeded. On canary skip / cancel / canary-select failure, the regular fan-out falls back to the full search-space-config so coverage is preserved.
  • The fan-out gate blocks only on canary-sweep.result == 'failure' — every other state (success, skipped, cancelled) proceeds, so a bug in the canary mechanism never blocks the rest of the sweep.
  • Non-full-sweep PRs, draft PRs, pushes to main, and the reuse path all behave identically to before via existing gates.

The aggregated results_bmk artifact picks up both the canary's row and the regular fan-out's rows via the existing bmk_* glob — each entry appears exactly once.

Docker-tag-monitor research block split out to #1538.

Test plan

  • Validate that a full-sweep PR runs canary first, then fans out only on canary success
  • Validate that a deliberately-broken full-sweep PR has its fan-out skipped after canary failure
  • Validate that non-full-sweep PRs, draft PRs, and pushes to main are unaffected

Comment on lines +182 to +210
needs: canary-select
if: ${{ needs.canary-select.outputs.canary-config != '' && needs.canary-select.outputs.canary-config != '[]' }}
uses: ./.github/workflows/benchmark-tmpl.yml
name: canary /
strategy:
fail-fast: false
matrix:
config: ${{ fromJson(needs.canary-select.outputs.canary-config) }}
secrets: inherit
with:
exp-name: ${{ matrix.config.exp-name }}
isl: ${{ matrix.config.isl }}
osl: ${{ matrix.config.osl }}
max-model-len: ${{ matrix.config.max-model-len }}
runner: ${{ matrix.config.runner }}
image: ${{ matrix.config.image }}
model: ${{ matrix.config.model }}
model-prefix: ${{ matrix.config.model-prefix }}
framework: ${{ matrix.config.framework }}
precision: ${{ matrix.config.precision }}
tp: ${{ matrix.config.tp }}
ep: ${{ matrix.config.ep }}
dp-attn: ${{ matrix.config.dp-attn }}
conc: ${{ matrix.config.conc }}
spec-decoding: ${{ matrix.config.spec-decoding }}
disagg: ${{ matrix.config.disagg }}
run-eval: false

sweep-multi-node-1k1k:
When a PR carries `full-sweep-enabled` (and not `evals-only`), pick the
lowest-conc single-node benchmark entry as a canary and run it before
fanning out the full sweep. If the canary fails, the eight fan-out jobs
are skipped to save cluster time on shared failures (bad image tag,
removed CLI flag, etc.).

Design choices:
- Canary candidacy is restricted to single_node['1k1k' | '8k1k'] and
  excludes entries with run-eval: true, so the canary is always a pure
  benchmark smoke test using the existing single-node template.
- The canary entry is removed from the regular fan-out's matrix (via
  remaining-search-space-config) only when the canary actually succeeded.
  On canary skip / cancel / canary-select failure, the regular fan-out
  falls back to the full search-space-config so coverage is preserved.
- The fan-out gate blocks only on `canary-sweep.result == 'failure'` --
  every other state (success, skipped, cancelled) proceeds, so a bug in
  the canary mechanism never blocks the rest of the sweep.
- Non-full-sweep PRs, draft PRs, pushes to main, and the reuse path all
  behave identically to before via existing gates.

The aggregated results_bmk artifact picks up both the canary's row and
the regular fan-out's rows via the existing bmk_* glob -- each entry
appears exactly once.
@Oseltamivir Oseltamivir changed the title Sequential canary for full-sweep PRs + pre-flight research in docker-tag-monitor run-sweep: gate full-sweep PRs behind a sequential canary May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants