Skip to content

Feat/cross rate early exit#605

Open
ushaket wants to merge 2 commits into
vllm-project:mainfrom
ushaket:feat/cross-rate-early-exit
Open

Feat/cross rate early exit#605
ushaket wants to merge 2 commits into
vllm-project:mainfrom
ushaket:feat/cross-rate-early-exit

Conversation

@ushaket

@ushaket ushaket commented Feb 23, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR adds cross-rate early-exit behavior for multi-rate benchmark profiles so benchmarks stop escalating once a terminal failure condition is hit at a lower rate/stream. It also makes rate/stream ordering deterministic (ascending) and updates user-facing docs/help text to reflect multi-rate semantics and skip behavior.

Details

  • Added shared failure-check logic in profile flow to detect terminal scheduler constraints (request_processing=stop_all) from the previous benchmark state.
  • Updated AsyncProfile.next_strategy() to stop scheduling higher rates after a terminal failure; continue normally on stop_local.
  • Updated ConcurrentProfile.next_strategy() with the same early-exit behavior for stream escalation.
  • Updated SweepProfile.next_strategy() so synchronous and throughput always run, with early-exit applied only during async-rate continuation.
  • Sorted multi-value rates/streams ascending in argument resolution for deterministic progression; added warning logs when input order is changed.
  • Updated CLI help and README to document per-profile --rate semantics, multi-value behavior, and failure-triggered skipping.
  • Added unit tests covering:
    • sorting behavior
    • continuation on normal completion (stop_local)
    • early exit on terminal failures (stop_all)
    • sweep-specific phase behavior and edge cases

Test Plan

  • Run unit tests for benchmark profiles:
    • pytest tests/unit/benchmark/test_profiles.py
  • Sanity-check existing benchmark profile tests still pass:
    • pytest tests/unit/benchmark -k profile
  • Run a multi-rate benchmark and verify higher rates are skipped after a terminal failure:
    • guidellm benchmark ... --profile constant --rate 1 --rate 5 --rate 10

Related Issues


  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

git log

commit 040b463
Author: Uri Shaket ushaket@redhat.com
Date: Sun Feb 15 13:56:30 2026 +0200

Add cross-rate early exit for multi-rate benchmark profiles

When running multiple rates (constant, poisson, concurrent profiles) or
sweeping, stop escalating to higher rates if a failure constraint
(over-saturation, max errors, error rate) triggers at a lower rate.

- Sort rates/streams ascending in AsyncProfile and ConcurrentProfile
- Add _should_stop_escalating() on base Profile class using stop_all
  as the failure signal (vs stop_local for normal completions)
- Skip failure check after throughput phase in SweepProfile since
  over-saturation is expected at maximum load
- Log warning when rate order is changed by sorting
- Update CLI help and README with multi-rate documentation
- Add comprehensive unit tests for all profile types

Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

commit f9758df
Author: Uri Shaket ushaket@redhat.com
Date: Mon Jun 29 16:40:30 2026 +0300

Address review feedback: fix docs, comments, and docstring

- Simplify _should_stop_escalating param docstring (not a precondition)
- Consolidate sweep.py early-exit comments into one clear block
- Revert README multi-rate sentence, move docs to getting-started/benchmark.md
- Add early-exit behavior notes to Concurrent, Constant, Poisson, Sweep sections

Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor cursoragent@cursor.com
Signed-off-by: Uri Shaket ushaket@redhat.com

@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch from 0d2a434 to 0698383 Compare February 23, 2026 14:27
@sjmonson sjmonson self-requested a review February 23, 2026 18:37
@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch 3 times, most recently from ed54a85 to 6ffc31d Compare February 24, 2026 15:07
@dbutenhof dbutenhof added this to the v0.7.0 milestone Mar 30, 2026
@mergify

mergify Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

@ushaket, this project requires a linear history on feature branches.
Your PR contains merge commits. Please rebase your branch against main
and remove them.

You can do this by running:
git pull --rebase upstream main

@mergify mergify Bot added the needs-rebase label Mar 30, 2026
@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch 2 times, most recently from 6ef8cff to 634455d Compare June 29, 2026 10:38
Comment thread src/guidellm/benchmark/profiles/profile.py Outdated
Comment thread src/guidellm/benchmark/profiles/sweep.py Outdated
Comment thread README.md Outdated
@sjmonson sjmonson modified the milestones: v0.7.0, v0.7.2 Jul 1, 2026
ushaket and others added 2 commits July 2, 2026 16:31
When running multiple rates (constant, poisson, concurrent profiles) or
sweeping, stop escalating to higher rates if a failure constraint
(over-saturation, max errors, error rate) triggers at a lower rate.

- Sort rates/streams ascending in AsyncProfile and ConcurrentProfile
- Add _should_stop_escalating() on base Profile class using stop_all
  as the failure signal (vs stop_local for normal completions)
- Skip failure check after throughput phase in SweepProfile since
  over-saturation is expected at maximum load
- Log warning when rate order is changed by sorting
- Update CLI help and README with multi-rate documentation
- Add comprehensive unit tests for all profile types

Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
- Simplify _should_stop_escalating param docstring (not a precondition)
- Consolidate sweep.py early-exit comments into one clear block
- Revert README multi-rate sentence, move docs to getting-started/benchmark.md
- Add early-exit behavior notes to Concurrent, Constant, Poisson, Sweep sections

Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Signed-off-by: Uri Shaket <ushaket@redhat.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@ushaket ushaket force-pushed the feat/cross-rate-early-exit branch from 871d39a to f9758df Compare July 2, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stop multi-rate benchmarks after first failure threshold

3 participants