What happened + What you expected to happen
When a DeploymentHandle uses _by_reference=False (gRPC transport), the AsyncioRouter's request-completion callback never invalidates the queue-length cache entry for a failed replica. After a gRPC failure, the next request may still be routed to that replica by power-of-2-choices until either (a) the rejection path on the next request to that replica invalidates the cache, or (b) the controller's long-poll pushes a new replica set.
The actor path (_by_reference=True, default) is unaffected.
Root cause
AsyncioRouter._process_finished_request dispatches on the type of the result passed to the done-callback:
|
def _process_finished_request( |
|
self, |
|
replica_id: ReplicaID, |
|
internal_request_id: str, |
|
replica_actor_id: Optional[ray.ActorID], |
|
result: Union[Any, RayError], |
|
) -> None: |
|
if RAY_SERVE_COLLECT_AUTOSCALING_METRICS_ON_HANDLE: |
|
self._metrics_manager.dec_num_running_requests_for_replica(replica_id) |
|
|
|
# Notify request router that request completed (for cleanup, e.g., token release) |
|
if self.request_router: |
|
self.request_router.on_request_completed(replica_id, internal_request_id) |
|
|
|
actor_died_error = self._get_actor_died_error(result) |
|
if actor_died_error is not None: |
|
self._handle_actor_died_error( |
|
replica_id, replica_actor_id, actor_died_error |
|
) |
|
elif isinstance(result, ActorUnavailableError): |
|
# There are network issues, or replica has died but GCS is down so |
|
# ActorUnavailableError will be raised until GCS recovers. For the |
|
# time being, invalidate the cache entry so that we don't try to |
|
# send requests to this replica without actively probing, and retry |
|
# routing request. |
|
if self.request_router: |
|
self.request_router.on_replica_actor_unavailable(replica_id) |
The shape of result depends on the transport:
| Transport |
_by_reference |
result in done-callback |
Source |
| Actor |
True (default) |
Deserialized return value or RayError subclass |
ObjectRef._on_completed → async_callback in _raylet.pyx |
| gRPC |
False |
grpc.aio.Call object |
gRPCReplicaResult.add_done_callback → grpc.aio.Call.add_done_callback |
On the gRPC path result is a grpc.aio.Call, so _get_actor_died_error returns None and the isinstance(result, ActorUnavailableError) check is False. Both branches are skipped silently.
Versions / Dependencies
master
Reproduction script
N/A
Issue Severity
None
What happened + What you expected to happen
When a
DeploymentHandleuses_by_reference=False(gRPC transport), theAsyncioRouter's request-completion callback never invalidates the queue-length cache entry for a failed replica. After a gRPC failure, the next request may still be routed to that replica by power-of-2-choices until either (a) the rejection path on the next request to that replica invalidates the cache, or (b) the controller's long-poll pushes a new replica set.The actor path (
_by_reference=True, default) is unaffected.Root cause
AsyncioRouter._process_finished_requestdispatches on the type of theresultpassed to the done-callback:ray/python/ray/serve/_private/router.py
Lines 852 to 878 in 98eecf3
The shape of
resultdepends on the transport:_by_referenceresultin done-callbackTrue(default)RayErrorsubclassObjectRef._on_completed→async_callbackin_raylet.pyxFalsegrpc.aio.CallobjectgRPCReplicaResult.add_done_callback→grpc.aio.Call.add_done_callbackOn the gRPC path
resultis agrpc.aio.Call, so_get_actor_died_errorreturnsNoneand theisinstance(result, ActorUnavailableError)check isFalse. Both branches are skipped silently.Versions / Dependencies
master
Reproduction script
N/A
Issue Severity
None