Skip to content

[serve] Add max_request_retries to bound router retry loops#64399

Open
HrushiYadav wants to merge 2 commits into
ray-project:masterfrom
HrushiYadav:fix/serve-router-retry-limit
Open

[serve] Add max_request_retries to bound router retry loops#64399
HrushiYadav wants to merge 2 commits into
ray-project:masterfrom
HrushiYadav:fix/serve-router-retry-limit

Conversation

@HrushiYadav

Copy link
Copy Markdown

Description

The router's while True retry loop in route_and_send_request and _pick_and_reserve_replica retries indefinitely when replicas reject requests. max_queued_requests is only checked once at the entry gate, not during retries. Under sustained load with large payloads (~50MB), requests pile up in the retry loop causing OOM -- num_queued_requests can reach 119 when max_queued_requests=20.

This PR adds max_request_retries to RequestRouterConfig:

  • Default -1 (unlimited) preserves backwards compatibility
  • When set to a non-negative integer, the router raises BackPressureError (503) after exhausting retries
  • Both route_and_send_request (replica rejection retry) and _pick_and_reserve_replica (capacity queue retry) are bounded
  • Configurable via RequestRouterConfig(max_request_retries=N) or env var RAY_SERVE_ROUTER_MAX_REQUEST_RETRIES

Related issues

Fixes #61017

Additional information

Files changed:

  • constants.py -- new RAY_SERVE_ROUTER_MAX_REQUEST_RETRIES env var (follows existing backoff env var pattern)
  • config.py -- new max_request_retries field on RequestRouterConfig with validator, __eq__, __hash__
  • router.py -- _check_retry_limit() helper, retry counter in both loops
  • test_backpressure.py -- config validation test + integration test with max_request_retries=3

Semantics:

  • max_request_retries=-1: unlimited (default, no behavior change)
  • max_request_retries=0: no retries (fail on first rejection)
  • max_request_retries=N: retry up to N times, then 503

The router's retry loop retries indefinitely when replicas reject
requests. Under sustained load with large payloads, requests pile up
in the retry loop causing OOM (num_queued_requests far exceeds
max_queued_requests).

Add max_request_retries to RequestRouterConfig (default -1 = unlimited
for backwards compatibility). When the limit is exceeded, raise
BackPressureError (503) to shed load. Both route_and_send_request and
_pick_and_reserve_replica retry loops are bounded.

Fixes ray-project#61017

Signed-off-by: Hrushikesh Yadav <yadavhrushikesh65@gmail.com>
@HrushiYadav HrushiYadav requested a review from a team as a code owner June 28, 2026 11:19

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a configurable limit on the maximum number of times a request can be retried by the router before being dropped with a BackPressureError. It adds max_request_retries to RequestRouterConfig and implements retry tracking and limit checking in AsyncioRouter. The review feedback suggests avoiding accessing the private _deployment_config attribute of RouterMetricsManager inside _check_retry_limit by instead storing and updating _max_queued_requests directly on AsyncioRouter.

Comment on lines 673 to +676
self._initial_backoff_s: Optional[float] = None
self._backoff_multiplier: Optional[float] = None
self._max_backoff_s: Optional[float] = None
self._max_request_retries: int = -1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To avoid accessing the private _deployment_config attribute of RouterMetricsManager inside _check_retry_limit, we should store _max_queued_requests directly on AsyncioRouter and initialize it here.

Suggested change
self._initial_backoff_s: Optional[float] = None
self._backoff_multiplier: Optional[float] = None
self._max_backoff_s: Optional[float] = None
self._max_request_retries: int = -1
self._initial_backoff_s: Optional[float] = None
self._backoff_multiplier: Optional[float] = None
self._max_backoff_s: Optional[float] = None
self._max_request_retries: int = -1
self._max_queued_requests: int = -1

Comment on lines +852 to +854
self._max_request_retries = (
deployment_config.request_router_config.max_request_retries
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Update self._max_queued_requests when the deployment config is updated.

Suggested change
self._max_request_retries = (
deployment_config.request_router_config.max_request_retries
)
self._max_request_retries = (
deployment_config.request_router_config.max_request_retries
)
self._max_queued_requests = deployment_config.max_queued_requests

Comment on lines +1124 to +1136
def _check_retry_limit(self, num_retries: int) -> None:
"""Raise BackPressureError if the retry limit has been exceeded."""
max_retries = self._max_request_retries
if max_retries >= 0 and num_retries > max_retries:
deployment_config = self._metrics_manager._deployment_config
raise BackPressureError(
num_queued_requests=self._metrics_manager.num_queued_requests,
max_queued_requests=(
deployment_config.max_queued_requests
if deployment_config is not None
else -1
),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Use the newly stored self._max_queued_requests directly instead of accessing the private _deployment_config attribute of RouterMetricsManager.

    def _check_retry_limit(self, num_retries: int) -> None:
        """Raise BackPressureError if the retry limit has been exceeded."""
        max_retries = self._max_request_retries
        if max_retries >= 0 and num_retries > max_retries:
            raise BackPressureError(
                num_queued_requests=self._metrics_manager.num_queued_requests,
                max_queued_requests=self._max_queued_requests,
            )

@ray-gardener ray-gardener Bot added serve Ray Serve Related Issue community-contribution Contributed by the community labels Jun 28, 2026
- Add self._max_queued_requests field initialized in __init__
- Update it in update_deployment_config alongside other config fields
- Use it in _check_retry_limit instead of accessing private
  _metrics_manager._deployment_config attribute

Signed-off-by: Hrushikesh Yadav <yadavhrushikesh65@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] Allow disabling router retry

1 participant