[rollout, trainer, cfg] feat: per-request abort hooks and AbortableLLMServerClient by cr-gao · Pull Request #6865 · verl-project/verl

cr-gao · 2026-06-27T13:08:37Z

What does this PR do?

Adds selective (per-request) abort support to the rollout client path. Today
LLMServerClient exposes no way to interrupt a single in-flight request — only the
broadcast abort_all_requests on the rollout replica exists. This PR introduces
lightweight extension hooks plus a concrete AbortableLLMServerClient that tracks
in-flight requests so a custom AgentLoopWorker/AgentLoopManager can abort one specific
request (e.g. a timed-out rollout, or a custom trajectory-collection policy that decides to
drop a particular sequence).

AI assistance (Claude) was used to write this change; the author has reviewed every line.

Checklist Before Starting

Search for similar PRs. Query: https://github.com/verl-project/verl/pulls?q=is%3Apr+abort+LLMServerClient (none found)

Resolves #6866. Feature discussion / motivation: #6866.

Format the PR title as [{modules}] {type}: {description}

Test

Added tests/workers/rollout/test_abortable_llm_client_on_cpu.py (CPU-only, mocks the Ray
load balancer + server handles — no GPU/vLLM needed):

test_record_abort_and_cleanup — dispatch records the request; abort() targets the
inner vLLM request id (not the outer sticky-session id); completion clears the entry.
test_inflight_cleared_on_normal_completion — normal finish also clears the table.
test_abort_unknown_request_is_noop — aborting an unknown/finished id neither raises nor
hits the server.
test_cancellation_aborts_server_and_clears_inflight — cancelling generate() (e.g.
asyncio.wait_for/timeout) fires _on_cancel, which aborts the server-side request so it
does not leak, and _on_complete still clears the in-flight entry.

pytest tests/workers/rollout/test_abortable_llm_client_on_cpu.py -v

tests/workers/rollout/test_abortable_llm_client_on_cpu.py::test_record_abort_and_cleanup PASSED                  [ 25%]
tests/workers/rollout/test_abortable_llm_client_on_cpu.py::test_inflight_cleared_on_normal_completion PASSED     [ 50%]
tests/workers/rollout/test_abortable_llm_client_on_cpu.py::test_abort_unknown_request_is_noop PASSED             [ 75%]
tests/workers/rollout/test_abortable_llm_client_on_cpu.py::test_cancellation_aborts_server_and_clears_inflight PASSED [100%]

API and Usage Example

New config field actor_rollout_ref.rollout.agent.llm_client_class (FQN, dynamically
imported — mirrors the existing agent_loop_manager_class), wired in ray_trainer.py:

actor_rollout_ref:
  rollout:
    agent:
      llm_client_class: verl.workers.rollout.llm_server.AbortableLLMServerClient

# Inside a custom AgentLoopWorker/Manager holding the client:
# Explicit per-request abort:
await llm_client.abort(request_id)

# Cancellation (e.g. timeout) is handled automatically — the _on_cancel hook aborts the
# server-side request, so this does NOT leak server compute:
await asyncio.wait_for(llm_client.generate(request_id=request_id, ...), timeout=T)

Design & Code Changes

llm_server.py:
- Hoist the inner per-turn vLLM request_id into a named inner_request_id so subclasses
  can record it.
- Add no-op hooks _on_dispatch(request_id, inner_request_id, server) (before awaiting
  generation), _on_complete(request_id) (in finally), and _on_cancel(request_id)
  (in except asyncio.CancelledError, before the finally, so the in-flight entry still
  exists).
- Add AbortableLLMServerClient(LLMServerClient): tracks request_id -> (inner_request_id, server) in _inflight; _on_complete is the sole cleanup point; abort(request_id)
  and _on_cancel share a _send_abort helper that sends a precise abort_request to the
  right server (abort awaits it; _on_cancel is fire-and-forget, since awaiting during
  cancellation is unsafe). Inherits the base (non-resuming) generate, i.e. hard-abort
  semantics.
workers/config/rollout.py: add llm_client_class field to AgentLoopConfig.
trainer/ppo/ray_trainer.py: load llm_client_class (if set) and pass it as client_cls
to get_client().
agent_loop.py: fix typo chunkes -> chunks.

Checklist Before Submitting

Read the Contribute Guide.
Apply pre-commit checks. (All hooks pass; the local check-naming-conventions false
positive only triggers on vendored files under .venv, which is gitignored and absent in CI.)
Add / Update the documentation.
Add unit or end-to-end test(s) to cover the code.
Send a message in the ci-request channel once ready for CI.

🤖 Generated with Claude Code

…MServerClient Add selective per-request abort to the rollout client path: extension hooks on LLMServerClient plus a concrete AbortableLLMServerClient, selectable via the new actor_rollout_ref.rollout.agent.llm_client_class config. Also fix a typo (chunkes -> chunks) in agent_loop.py. Co-authored-by: Claude Signed-off-by: cr-gao <gaochenrui@sjtu.edu.cn>

CLAassistant · 2026-06-27T13:08:44Z

All committers have signed the CLA.

gemini-code-assist

Code Review

This pull request introduces AbortableLLMServerClient to support per-request aborts and in-flight request tracking, adds corresponding CPU unit tests, and allows configuring a custom LLMServerClient class. It also fixes a minor typo in the agent loop. The review feedback highlights a critical issue where client-side task cancellation (such as timeouts) would silently leak GPU resources on the vLLM server because the server-side request is not aborted; the reviewer suggests introducing and implementing an _on_cancel hook to safely abort the request upon cancellation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-27T13:10:11Z

        finally:
+            self._on_complete(request_id)
            self._release_server(server_id)

+    def _on_dispatch(self, request_id, inner_request_id, server):
+        """Hook fired after the inner vLLM request_id is assigned, before awaiting generation.
+        Default no-op; subclasses may record (request_id/inner_request_id, server) to enable abort."""
+
+    def _on_complete(self, request_id):
+        """Hook fired when generation finishes or raises. Default no-op."""


If the client-side generate task is cancelled (e.g., due to a timeout via asyncio.timeout or asyncio.wait_for), the finally block runs immediately and removes the request from _inflight via _on_complete. However, the server-side request is never aborted, leading to silent GPU resource leaks on the vLLM server.

To fix this, we should catch asyncio.CancelledError in generate, trigger a new _on_cancel hook to abort the request on the server, and then re-raise the cancellation.

Suggested change

finally:

self._on_complete(request_id)

self._release_server(server_id)

def _on_dispatch(self, request_id, inner_request_id, server):

"""Hook fired after the inner vLLM request_id is assigned, before awaiting generation.

Default no-op; subclasses may record (request_id/inner_request_id, server) to enable abort."""

def _on_complete(self, request_id):

"""Hook fired when generation finishes or raises. Default no-op."""

except asyncio.CancelledError:

self._on_cancel(request_id)

raise

finally:

self._on_complete(request_id)

self._release_server(server_id)

def _on_dispatch(self, request_id, inner_request_id, server):

"""Hook fired after the inner vLLM request_id is assigned, before awaiting generation.

Default no-op; subclasses may record (request_id/inner_request_id, server) to enable abort."""

def _on_complete(self, request_id):

"""Hook fired when generation finishes or raises. Default no-op."""

def _on_cancel(self, request_id):

"""Hook fired when generation is cancelled. Default no-op."""

Good catch. asyncio.wait_for/timeout cancellation indeed leaked the server-side request, and that's exactly one of the motivating use cases. Added an _on_cancel(request_id) hook fired on CancelledError (before the finally so the in-flight entry still exists), implemented as a fire-and-forget abort in AbortableLLMServerClient (we can't safely await during cancellation), with abort() and _on_cancel sharing a _send_abort helper. Also added a regression test for the cancellation path. Done in aeadf2f.

gemini-code-assist · 2026-06-27T13:10:11Z

+    def _on_complete(self, request_id):
+        # Sole owner of removal: fires from generate()'s finally on success, abort, or error.
+        self._inflight.pop(request_id, None)


Implement the _on_cancel hook in AbortableLLMServerClient to automatically abort the request on the server when the client-side task is cancelled. This prevents GPU resource leaks on the vLLM server during timeouts or cancellations.

def _on_complete(self, request_id): # Sole owner of removal: fires from generate()'s finally on success, abort, or error. self._inflight.pop(request_id, None) def _on_cancel(self, request_id): entry = self._inflight.get(request_id) if entry is not None: inner_request_id, server = entry server.abort_request.remote(inner_request_id)

Implemented _on_cancel in AbortableLLMServerClient as fire-and-forget (_send_abort, no await during cancellation); removal stays owned by _on_complete. Done in aeadf2f.

Add an _on_cancel hook fired on asyncio.CancelledError so AbortableLLMServerClient fire-and-forget aborts the server-side request when generate() is cancelled (e.g. asyncio.wait_for/timeout), preventing silent GPU resource leaks. Add a regression test for the cancellation path. Addresses gemini-code-assist review feedback on verl-project#6865. Co-authored-by: Claude Signed-off-by: cr-gao <gaochenrui@sjtu.edu.cn>

cr-gao requested review from ArronHZG, PeterSH6, eric-haibin-lin, tongyx361, vermouth1992 and wuxibin89 as code owners June 27, 2026 13:08

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

cr-gao mentioned this pull request Jun 27, 2026

[Feature request] Per-request abort for rollout LLMServerClient #6866

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rollout, trainer, cfg] feat: per-request abort hooks and AbortableLLMServerClient#6865

[rollout, trainer, cfg] feat: per-request abort hooks and AbortableLLMServerClient#6865
cr-gao wants to merge 2 commits into
verl-project:mainfrom
cr-gao:feat/llm-client-abort-hooks

cr-gao commented Jun 27, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jun 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Uh oh!

cr-gao Jun 27, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Uh oh!

cr-gao Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cr-gao commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

CLAassistant commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

cr-gao Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

cr-gao Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cr-gao commented Jun 27, 2026 •

edited

Loading

CLAassistant commented Jun 27, 2026 •

edited

Loading

cr-gao Jun 27, 2026 •

edited

Loading