[rollout, trainer, cfg] feat: per-request abort hooks and AbortableLLMServerClient#6865
[rollout, trainer, cfg] feat: per-request abort hooks and AbortableLLMServerClient#6865cr-gao wants to merge 2 commits into
Conversation
…MServerClient Add selective per-request abort to the rollout client path: extension hooks on LLMServerClient plus a concrete AbortableLLMServerClient, selectable via the new actor_rollout_ref.rollout.agent.llm_client_class config. Also fix a typo (chunkes -> chunks) in agent_loop.py. Co-authored-by: Claude Signed-off-by: cr-gao <gaochenrui@sjtu.edu.cn>
There was a problem hiding this comment.
Code Review
This pull request introduces AbortableLLMServerClient to support per-request aborts and in-flight request tracking, adds corresponding CPU unit tests, and allows configuring a custom LLMServerClient class. It also fixes a minor typo in the agent loop. The review feedback highlights a critical issue where client-side task cancellation (such as timeouts) would silently leak GPU resources on the vLLM server because the server-side request is not aborted; the reviewer suggests introducing and implementing an _on_cancel hook to safely abort the request upon cancellation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| finally: | ||
| self._on_complete(request_id) | ||
| self._release_server(server_id) | ||
|
|
||
| def _on_dispatch(self, request_id, inner_request_id, server): | ||
| """Hook fired after the inner vLLM request_id is assigned, before awaiting generation. | ||
| Default no-op; subclasses may record (request_id/inner_request_id, server) to enable abort.""" | ||
|
|
||
| def _on_complete(self, request_id): | ||
| """Hook fired when generation finishes or raises. Default no-op.""" |
There was a problem hiding this comment.
If the client-side generate task is cancelled (e.g., due to a timeout via asyncio.timeout or asyncio.wait_for), the finally block runs immediately and removes the request from _inflight via _on_complete. However, the server-side request is never aborted, leading to silent GPU resource leaks on the vLLM server.
To fix this, we should catch asyncio.CancelledError in generate, trigger a new _on_cancel hook to abort the request on the server, and then re-raise the cancellation.
| finally: | |
| self._on_complete(request_id) | |
| self._release_server(server_id) | |
| def _on_dispatch(self, request_id, inner_request_id, server): | |
| """Hook fired after the inner vLLM request_id is assigned, before awaiting generation. | |
| Default no-op; subclasses may record (request_id/inner_request_id, server) to enable abort.""" | |
| def _on_complete(self, request_id): | |
| """Hook fired when generation finishes or raises. Default no-op.""" | |
| except asyncio.CancelledError: | |
| self._on_cancel(request_id) | |
| raise | |
| finally: | |
| self._on_complete(request_id) | |
| self._release_server(server_id) | |
| def _on_dispatch(self, request_id, inner_request_id, server): | |
| """Hook fired after the inner vLLM request_id is assigned, before awaiting generation. | |
| Default no-op; subclasses may record (request_id/inner_request_id, server) to enable abort.""" | |
| def _on_complete(self, request_id): | |
| """Hook fired when generation finishes or raises. Default no-op.""" | |
| def _on_cancel(self, request_id): | |
| """Hook fired when generation is cancelled. Default no-op.""" |
There was a problem hiding this comment.
Good catch. asyncio.wait_for/timeout cancellation indeed leaked the server-side request, and that's exactly one of the motivating use cases. Added an _on_cancel(request_id) hook fired on CancelledError (before the finally so the in-flight entry still exists), implemented as a fire-and-forget abort in AbortableLLMServerClient (we can't safely await during cancellation), with abort() and _on_cancel sharing a _send_abort helper. Also added a regression test for the cancellation path. Done in aeadf2f.
| def _on_complete(self, request_id): | ||
| # Sole owner of removal: fires from generate()'s finally on success, abort, or error. | ||
| self._inflight.pop(request_id, None) |
There was a problem hiding this comment.
Implement the _on_cancel hook in AbortableLLMServerClient to automatically abort the request on the server when the client-side task is cancelled. This prevents GPU resource leaks on the vLLM server during timeouts or cancellations.
def _on_complete(self, request_id):
# Sole owner of removal: fires from generate()'s finally on success, abort, or error.
self._inflight.pop(request_id, None)
def _on_cancel(self, request_id):
entry = self._inflight.get(request_id)
if entry is not None:
inner_request_id, server = entry
server.abort_request.remote(inner_request_id)There was a problem hiding this comment.
Implemented _on_cancel in AbortableLLMServerClient as fire-and-forget (_send_abort, no await during cancellation); removal stays owned by _on_complete. Done in aeadf2f.
Add an _on_cancel hook fired on asyncio.CancelledError so AbortableLLMServerClient fire-and-forget aborts the server-side request when generate() is cancelled (e.g. asyncio.wait_for/timeout), preventing silent GPU resource leaks. Add a regression test for the cancellation path. Addresses gemini-code-assist review feedback on verl-project#6865. Co-authored-by: Claude Signed-off-by: cr-gao <gaochenrui@sjtu.edu.cn>
What does this PR do?
Adds selective (per-request) abort support to the rollout client path. Today
LLMServerClientexposes no way to interrupt a single in-flight request — only thebroadcast
abort_all_requestson the rollout replica exists. This PR introduceslightweight extension hooks plus a concrete
AbortableLLMServerClientthat tracksin-flight requests so a custom
AgentLoopWorker/AgentLoopManagercan abort one specificrequest (e.g. a timed-out rollout, or a custom trajectory-collection policy that decides to
drop a particular sequence).
AI assistance (Claude) was used to write this change; the author has reviewed every line.
Checklist Before Starting
Resolves #6866. Feature discussion / motivation: #6866.
[{modules}] {type}: {description}Test
Added
tests/workers/rollout/test_abortable_llm_client_on_cpu.py(CPU-only, mocks the Rayload balancer + server handles — no GPU/vLLM needed):
test_record_abort_and_cleanup— dispatch records the request;abort()targets theinner vLLM request id (not the outer sticky-session id); completion clears the entry.
test_inflight_cleared_on_normal_completion— normal finish also clears the table.test_abort_unknown_request_is_noop— aborting an unknown/finished id neither raises norhits the server.
test_cancellation_aborts_server_and_clears_inflight— cancellinggenerate()(e.g.asyncio.wait_for/timeout) fires_on_cancel, which aborts the server-side request so itdoes not leak, and
_on_completestill clears the in-flight entry.API and Usage Example
New config field
actor_rollout_ref.rollout.agent.llm_client_class(FQN, dynamicallyimported — mirrors the existing
agent_loop_manager_class), wired inray_trainer.py:Design & Code Changes
llm_server.py:request_idinto a namedinner_request_idso subclassescan record it.
_on_dispatch(request_id, inner_request_id, server)(before awaitinggeneration),
_on_complete(request_id)(infinally), and_on_cancel(request_id)(in
except asyncio.CancelledError, before thefinally, so the in-flight entry stillexists).
AbortableLLMServerClient(LLMServerClient): tracksrequest_id -> (inner_request_id, server)in_inflight;_on_completeis the sole cleanup point;abort(request_id)and
_on_cancelshare a_send_aborthelper that sends a preciseabort_requestto theright server (
abortawaits it;_on_cancelis fire-and-forget, since awaiting duringcancellation is unsafe). Inherits the base (non-resuming)
generate, i.e. hard-abortsemantics.
workers/config/rollout.py: addllm_client_classfield toAgentLoopConfig.trainer/ppo/ray_trainer.py: loadllm_client_class(if set) and pass it asclient_clsto
get_client().agent_loop.py: fix typochunkes->chunks.Checklist Before Submitting
check-naming-conventionsfalsepositive only triggers on vendored files under
.venv, which is gitignored and absent in CI.)ci-requestchannel once ready for CI.🤖 Generated with Claude Code