[serve] Add max_request_retries to bound router retry loops#64399
[serve] Add max_request_retries to bound router retry loops#64399HrushiYadav wants to merge 2 commits into
Conversation
The router's retry loop retries indefinitely when replicas reject requests. Under sustained load with large payloads, requests pile up in the retry loop causing OOM (num_queued_requests far exceeds max_queued_requests). Add max_request_retries to RequestRouterConfig (default -1 = unlimited for backwards compatibility). When the limit is exceeded, raise BackPressureError (503) to shed load. Both route_and_send_request and _pick_and_reserve_replica retry loops are bounded. Fixes ray-project#61017 Signed-off-by: Hrushikesh Yadav <yadavhrushikesh65@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a configurable limit on the maximum number of times a request can be retried by the router before being dropped with a BackPressureError. It adds max_request_retries to RequestRouterConfig and implements retry tracking and limit checking in AsyncioRouter. The review feedback suggests avoiding accessing the private _deployment_config attribute of RouterMetricsManager inside _check_retry_limit by instead storing and updating _max_queued_requests directly on AsyncioRouter.
| self._initial_backoff_s: Optional[float] = None | ||
| self._backoff_multiplier: Optional[float] = None | ||
| self._max_backoff_s: Optional[float] = None | ||
| self._max_request_retries: int = -1 |
There was a problem hiding this comment.
To avoid accessing the private _deployment_config attribute of RouterMetricsManager inside _check_retry_limit, we should store _max_queued_requests directly on AsyncioRouter and initialize it here.
| self._initial_backoff_s: Optional[float] = None | |
| self._backoff_multiplier: Optional[float] = None | |
| self._max_backoff_s: Optional[float] = None | |
| self._max_request_retries: int = -1 | |
| self._initial_backoff_s: Optional[float] = None | |
| self._backoff_multiplier: Optional[float] = None | |
| self._max_backoff_s: Optional[float] = None | |
| self._max_request_retries: int = -1 | |
| self._max_queued_requests: int = -1 |
| self._max_request_retries = ( | ||
| deployment_config.request_router_config.max_request_retries | ||
| ) |
There was a problem hiding this comment.
Update self._max_queued_requests when the deployment config is updated.
| self._max_request_retries = ( | |
| deployment_config.request_router_config.max_request_retries | |
| ) | |
| self._max_request_retries = ( | |
| deployment_config.request_router_config.max_request_retries | |
| ) | |
| self._max_queued_requests = deployment_config.max_queued_requests |
| def _check_retry_limit(self, num_retries: int) -> None: | ||
| """Raise BackPressureError if the retry limit has been exceeded.""" | ||
| max_retries = self._max_request_retries | ||
| if max_retries >= 0 and num_retries > max_retries: | ||
| deployment_config = self._metrics_manager._deployment_config | ||
| raise BackPressureError( | ||
| num_queued_requests=self._metrics_manager.num_queued_requests, | ||
| max_queued_requests=( | ||
| deployment_config.max_queued_requests | ||
| if deployment_config is not None | ||
| else -1 | ||
| ), | ||
| ) |
There was a problem hiding this comment.
Use the newly stored self._max_queued_requests directly instead of accessing the private _deployment_config attribute of RouterMetricsManager.
def _check_retry_limit(self, num_retries: int) -> None:
"""Raise BackPressureError if the retry limit has been exceeded."""
max_retries = self._max_request_retries
if max_retries >= 0 and num_retries > max_retries:
raise BackPressureError(
num_queued_requests=self._metrics_manager.num_queued_requests,
max_queued_requests=self._max_queued_requests,
)- Add self._max_queued_requests field initialized in __init__ - Update it in update_deployment_config alongside other config fields - Use it in _check_retry_limit instead of accessing private _metrics_manager._deployment_config attribute Signed-off-by: Hrushikesh Yadav <yadavhrushikesh65@gmail.com>
Description
The router's
while Trueretry loop inroute_and_send_requestand_pick_and_reserve_replicaretries indefinitely when replicas reject requests.max_queued_requestsis only checked once at the entry gate, not during retries. Under sustained load with large payloads (~50MB), requests pile up in the retry loop causing OOM --num_queued_requestscan reach 119 whenmax_queued_requests=20.This PR adds
max_request_retriestoRequestRouterConfig:-1(unlimited) preserves backwards compatibilityBackPressureError(503) after exhausting retriesroute_and_send_request(replica rejection retry) and_pick_and_reserve_replica(capacity queue retry) are boundedRequestRouterConfig(max_request_retries=N)or env varRAY_SERVE_ROUTER_MAX_REQUEST_RETRIESRelated issues
Fixes #61017
Additional information
Files changed:
constants.py-- newRAY_SERVE_ROUTER_MAX_REQUEST_RETRIESenv var (follows existing backoff env var pattern)config.py-- newmax_request_retriesfield onRequestRouterConfigwith validator,__eq__,__hash__router.py--_check_retry_limit()helper, retry counter in both loopstest_backpressure.py-- config validation test + integration test withmax_request_retries=3Semantics:
max_request_retries=-1: unlimited (default, no behavior change)max_request_retries=0: no retries (fail on first rejection)max_request_retries=N: retry up to N times, then 503