[train] feat: Add AgentFramework base class and OpenAI example implementation by zackcxb · Pull Request #58 · verl-project/uni-agent

zackcxb · 2026-06-12T04:57:19Z

What does this PR do?

This PR adds uni_agent.framework — a trainer-facing OpenAI-compatible agent framework built on top of the gateway runtime from the previous stacked PR. It turns each rollout sample into one or more gateway sessions, lets user runners interact with those sessions through OpenAI-compatible HTTP, finalizes token-level trajectories in the parent process, scores them through RewardLoopWorker, and writes outputs back to the sync trainer's TransferQueue schema.

Specifically:

AgentFrameworkRolloutAdapter — trainer-facing adapter for the agent_loop_manager_class extension point. Recipes can wire the same adapter in yaml without recipe-specific Python glue.
build_agent_framework — shared factory that constructs GatewayServingRuntime, loads tokenizer/processor via HFModelConfig, builds GatewayActorConfig, and delegates framework-specific config to the selected framework class.
OpenAICompatibleAgentFramework — reference implementation that owns batch/prompt/session orchestration, per-session gateway lifecycle, reward scoring, and TransferQueue writes.
Runner registry config — canonical agent_runners mapping from runner name to runner_fqn, runner_kwargs, dispatch_mode, and per-runner max_concurrent_sessions.
Runner contract — typed async callable AgentRunner protocol. Function runners and class runners with async __call__ are both accepted; no ABC inheritance requirement.
Dispatch modes — inline_async for in-process async runners and ray_task for per-session Ray task execution of blocking runners.
Session completion URL — SessionHandle.complete_url lets standalone runners mark completion over HTTP without taking a framework-owned session_runtime dependency.

The framework keeps RL correctness boundaries in the parent/gateway path: token-truth, commit-on-success session state, reward scoring, TransferQueue writes, finalize, and abort remain outside Ray runner workers. Ray workers only import and run the user runner, then return success or propagate exceptions back to the parent.

PR scope

This is the second PR in the stacked series:

gateway — uni_agent.gateway (already merged upstream)
framework (this PR) — uni_agent.framework
deepeyes examples — examples/agent_train/deepeyes_gateway/ (follow-up, stacked on framework)

Only the framework portion is reviewed here. The PR intentionally does not add recipe-specific runners or examples. DeepEyes and SWE integration live in follow-up recipe PRs.

Checklist Before Starting

Search for similar PRs: https://github.com/verl-project/uni-agent/pulls?q=is%3Apr+framework+gateway
Format the PR title: [framework] feat: add OpenAI-compatible agent framework

Test

pytest tests/uni_agent/framework/test_generate_sequences_on_cpu.py \
       tests/uni_agent/framework/test_multi_modal_postprocess_on_cpu.py \
       tests/uni_agent/gateway/test_session_runtime_on_cpu.py \
       tests/uni_agent/gateway/test_gateway_actor_on_cpu.py -q

Critical regression gates included:

test_generate_sequences_writes_tq_schema_for_each_session — writes prompt/session trajectories into the sync trainer TransferQueue schema.
test_generate_sequences_keeps_successful_sessions_when_one_session_fails — aborts failed sessions without dropping successful sibling sessions.
test_agent_runners_registry_selects_runner_by_agent_name — routes multi-runner batches by per-sample agent_name.
test_per_runner_max_concurrent_sessions_caps_only_selected_runner — applies max_concurrent_sessions as a per-runner in-flight cap, not a shared global cap.
test_ray_task_runner_reimports_from_fqn_and_finalizes_in_parent — Ray workers re-import runner FQNs while parent retains finalization and TQ writes.
test_ray_task_exception_aborts_only_that_session — Ray runner exceptions propagate to parent abort handling.
test_score_trajectories_dispatches_only_final_trajectory_and_broadcasts — matches AgentLoopWorkerTQ reward behavior: score final trajectory, broadcast to all session trajectories.
Gateway session-runtime tests cover SessionHandle.complete_url propagation.

API and Usage

Public API:

uni_agent.framework — AgentFramework, AgentRunner, OpenAICompatibleAgentFramework
uni_agent.framework.entry — build_agent_framework, AgentFrameworkRolloutAdapter
uni_agent.gateway.types.SessionHandle — now carries complete_url in addition to base_url

Minimum yaml shape:

actor_rollout_ref:
  rollout:
    agent:
      agent_loop_manager_class: uni_agent.framework.entry.AgentFrameworkRolloutAdapter
    custom:
      agent_framework:
        gateway_count: 8
        agent_runners:
          deepeyes:
            runner_fqn: my_recipe.runners.deepeyes_runner
            runner_kwargs:
              max_turns: 8
            dispatch_mode: inline_async
            max_concurrent_sessions: 0
          swe:
            runner_fqn: my_recipe.runners.swe_runner
            runner_kwargs:
              image: swe-agent-image
            dispatch_mode: ray_task
            max_concurrent_sessions: 16

Runner callable shape:

async def runner(*, session, raw_prompt, sample_index, tools_kwargs=None, **kwargs) -> None:
    response = requests.post(
        f"{session.base_url}/chat/completions",
        json={"model": "default", "messages": raw_prompt},
    )
    requests.post(session.complete_url, json={"reward_info": {"score": 1.0}})

agent_name is framework routing metadata. It selects a runner when multiple agent_runners are configured, but it is not forwarded to user runners. Runner-specific static config belongs in runner_kwargs; per-sample task config belongs in tools_kwargs.

Design & Code Changes

High-level structure:

entry.py — framework factory and trainer adapter. This is transitional glue until the trainer can call build_agent_framework directly.
framework.py — OpenAICompatibleAgentFramework, runner config parsing, dispatch backends, session orchestration, reward scoring, and TQ conversion.
gateway/types.py / gateway tests — add SessionHandle.complete_url.

Session flow:

TensorDict batch -> sample_fields -> rollout.n gateway sessions -> runner dispatch -> wait/finalize in parent -> score final trajectory -> broadcast score -> TransferQueue write.

Dispatch flow:

inline_async: materialize runner in-process at framework construction; run sessions directly on the parent event loop.
ray_task: submit one Ray task per session; Ray worker imports the runner by FQN and runs only the user runner; parent awaits ray.get through an executor so the parent event loop does not block.

Key invariants:

agent_runners is the only supported runner config shape; legacy single-runner config is intentionally rejected.
dispatch_mode is either inline_async or ray_task; no process/subprocess backend is introduced in this PR.
max_concurrent_sessions is per runner. There is no shared global cap.
Runner FQN is retained in framework config so Ray workers can re-import runners independently.
Token accounting, session finalization, reward scoring, abort, and TransferQueue writes stay in parent/gateway code, not inside Ray workers.
agent_name must be a string when multiple runners are configured.

WIP / Follow-up

Let main_ppo_sync.py call build_agent_framework directly so AgentFrameworkRolloutAdapter can retire.
Add recipe PRs for DeepEyes inline runner and SWE ray_task runner.
Consider replacing ad-hoc config parsing with a typed config dataclass once the final yaml path is stable.

Checklist Before Submitting

Read the Contribute Guide.
Add focused CPU tests for framework runner registry, dispatch, session lifecycle, reward scoring, and TQ writes.
Public classes / methods / fields carry docstrings where they define user-facing behavior.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files
Add / update documentation — this PR includes inline docstrings and PR-level usage notes; recipe documentation is deferred to follow-up PRs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a new trainer-driven agent framework stack, including the OpenAICompatibleAgentFramework implementation, integration adapters, multimodal postprocessing utilities, and comprehensive unit tests. The review feedback highlights several critical robustness and efficiency improvements: correctly handling BaseException (such as asyncio.CancelledError) in gathered tasks to prevent masked errors, awaiting ray.ObjectRef directly to avoid thread pool overhead, maintaining backward compatibility by removing Python 3.10-specific zip arguments, and ensuring consistent position ID initialization for padded tokens. Additionally, defensive checks are recommended for configuration parsing, empty trajectory fields, and reward worker responses.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T04:58:59Z

+
+    first_video = videos[0]
+    if isinstance(first_video, tuple) and len(first_video) == 2:
+        split_videos, video_metadata = zip(*videos, strict=False)


The strict argument was introduced to zip in Python 3.10. If this codebase is run on Python 3.8 or 3.9, passing strict=False will raise a TypeError. Since strict=False is the default behavior of zip, specifying it is redundant and harms backward compatibility.

Suggested change

split_videos, video_metadata = zip(*videos, strict=False)

split_videos, video_metadata = zip(*videos)

gemini-code-assist · 2026-06-12T04:58:59Z

+    vision_position_ids = vision_position_ids.transpose(0, 1)
+
+    valid_mask = attention_mask[0].bool()
+    text_position_ids = torch.ones((1, input_ids.shape[1]), dtype=torch.long, device=input_ids.device)


In compute_position_ids, text_position_ids is initialized with torch.ones. Consequently, any padded/invalid tokens (where valid_mask is False) will retain a position ID of 1. In contrast, the text-only path (using compute_position_id_with_mask) assigns 0 to padded tokens. To maintain consistency and avoid potential issues with position embeddings for padded tokens, initialize text_position_ids with torch.zeros instead.

Suggested change

text_position_ids = torch.ones((1, input_ids.shape[1]), dtype=torch.long, device=input_ids.device)

text_position_ids = torch.zeros((1, input_ids.shape[1]), dtype=torch.long, device=input_ids.device)

gemini-code-assist · 2026-06-12T04:58:59Z

+    session_runtime = GatewayServingRuntime(
+        llm_client=llm_client,
+        gateway_count=int(af_cfg["gateway_count"]),
+        gateway_actor_config=gateway_actor_config,
+    )


If gateway_count is missing from the configuration, af_cfg["gateway_count"] will raise a cryptic KeyError. It is safer to use .get("gateway_count") and raise a descriptive ValueError if it is not configured.

gateway_count = af_cfg.get("gateway_count") if gateway_count is None: raise ValueError("agent_framework.gateway_count is required in the configuration") session_runtime = GatewayServingRuntime( llm_client=llm_client, gateway_count=int(gateway_count), gateway_actor_config=gateway_actor_config, )

gemini-code-assist · 2026-06-12T04:58:59Z

+def _list_of_tq_fields_to_tensordict(fields: list[dict[str, object]]) -> TensorDict:
+    td = tu.list_of_dict_to_tensordict(fields)
+    for key in _TQ_NESTED_SEQUENCE_FIELDS:
+        if key not in fields[0]:


If fields is an empty list, accessing fields[0] will raise an IndexError. Although current callers filter out empty trajectories, adding a defensive guard at the beginning of _list_of_tq_fields_to_tensordict makes the utility function robust and prevents potential crashes if called with empty inputs in the future.

Suggested change

def _list_of_tq_fields_to_tensordict(fields: list[dict[str, object]]) -> TensorDict:

td = tu.list_of_dict_to_tensordict(fields)

for key in _TQ_NESTED_SEQUENCE_FIELDS:

if key not in fields[0]:

def _list_of_tq_fields_to_tensordict(fields: list[dict[str, object]]) -> TensorDict:

if not fields:

return TensorDict({}, batch_size=[0])

td = tu.list_of_dict_to_tensordict(fields)

for key in _TQ_NESTED_SEQUENCE_FIELDS:

if key not in fields[0]:

Test cleanup: - Slim test_generate_sequences_on_cpu.py from 890 to 544 lines via: • merge function/class runner dispatch coverage into one registry test • replace repeated config/TQ setup with shared local fixtures and builder • drop policy-only legacy-config, unsettled concurrency, and duplicate failure tests • merge optional trainer-field zero-fill checks and move finish_reason into schema coverage - Slim test_multi_modal_postprocess_on_cpu.py from 93 to 76 lines by merging overlapping text-path coverage and adding concise docstrings. Lint: ruff/ruff-format/mypy/compileall on PR-scope files. Auto-fixes applied; no behavior change. Regression: tests/uni_agent/framework/ + tests/uni_agent/gateway/ pass with 52 passed after the test cleanup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Co-authored-by: Codex <noreply@openai.com>

Wrap the framework in an AgentFrameworkWorker Ray actor so generate_sequences dispatches non-blocking. Build the gateway runtime driver-side and inject it so its actors stay driver-owned rather than subordinate to the framework worker; framework construction is synchronous (no async setup round-trip). Drop the pre-V1 main_ppo_sync ReplayBuffer 'running' marker compatibility from the adapter; prompt-status registration moves to the V1 trainer path. Co-authored-by: OpenAI Codex <noreply@openai.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…anager Collect the session domain (session/codec/protocol/types) into a gateway/session subpackage so the top level holds only the HTTP gateway and its driver-side manager. Merge GatewayServingRuntime into GatewayManager: the manager now owns actor spawn/shutdown plus session routing under one clear name (matching verl's AgentLoopManager), routed only through create/finalize/abort. Drop accumulated dead surface: the unused GatewayManager.set_reward_info (reward already flows Agent -> HTTP -> actor -> session), the test-only from_actors constructor, and the single-implementation _GatewayManager Protocol in the framework. Rename the framework's injected dependency session_runtime -> gateway_manager to name what it actually is. Rewrite the routing tests onto the real spawn path: one ownership round-trip test (finalize returns each session's own trajectory) replaces the least-load policy lock, and the two manager test files are merged into one. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

GatewayManager.create_session selected the least-loaded gateway but only incremented active_sessions_per_gateway AFTER the create await. Sessions are created concurrently on one event loop, so a burst of coroutines all ran the selection before any increment landed, read the same stale all-zero counts, and min() funneled every session onto the lowest-index gateway -- observed in SWE coding runs where long-lived sessions make the imbalance persist. Reserve the slot (route + count) synchronously before the await, rolling back if the remote create raises so failed sessions do not inflate the load estimate. Add a concurrent-creation balance test; prior routing tests created sessions sequentially, so each await (and its increment) completed before the next selection and the race never surfaced. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

feat(framework): OpenAI-compatible agent framework on gateway runtime

94e8054

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

zackcxb changed the title ~~Pr25 framework v2~~ [train]Add AgentFramework base class and OpenAI example implementation Jun 12, 2026

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

chore(framework): polish runner dispatch for PR

d83de96

zackcxb force-pushed the pr25-framework-v2 branch from 44748af to d83de96 Compare June 12, 2026 08:20

zackcxb marked this pull request as ready for review June 14, 2026 07:01

zackcxb changed the title ~~[train]Add AgentFramework base class and OpenAI example implementation~~ [train] feat: Add AgentFramework base class and OpenAI example implementation Jun 14, 2026

zackcxb and others added 3 commits June 16, 2026 09:10

feat(framework): support TQ adapter without replay buffer

658b71e

refactor(gateway): replace complete with reward info

4ad6510

Co-authored-by: Codex <noreply@openai.com>

fix(framework): tighten async dispatch error handling

33a8ab8

zackcxb force-pushed the pr25-framework-v2 branch from 4a69f4d to 33a8ab8 Compare June 16, 2026 14:12

zhaizhiqiangA assigned zhaizhiqiangA, wuxibin89 and yyDing1 Jun 17, 2026

wuxibin89 reviewed Jun 17, 2026

View reviewed changes

Comment thread uni_agent/framework/entry.py

wuxibin89 reviewed Jun 17, 2026

View reviewed changes

Comment thread uni_agent/framework/entry.py Outdated

zhaizhiqiangA reviewed Jun 17, 2026

View reviewed changes

Comment thread uni_agent/gateway/runtime.py Outdated

zackcxb force-pushed the pr25-framework-v2 branch from 4c0b409 to f2dfe4b Compare June 18, 2026 04:19

zackcxb and others added 2 commits June 18, 2026 04:31

zackcxb force-pushed the pr25-framework-v2 branch from f2dfe4b to 5dfeb55 Compare June 18, 2026 04:43

wuxibin89 approved these changes Jun 18, 2026

View reviewed changes

wuxibin89 merged commit ee61374 into verl-project:main Jun 18, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[train] feat: Add AgentFramework base class and OpenAI example implementation#58

[train] feat: Add AgentFramework base class and OpenAI example implementation#58
wuxibin89 merged 9 commits into
verl-project:mainfrom
zackcxb:pr25-framework-v2

zackcxb commented Jun 12, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	split_videos, video_metadata = zip(*videos, strict=False)
	split_videos, video_metadata = zip(*videos)

	text_position_ids = torch.ones((1, input_ids.shape[1]), dtype=torch.long, device=input_ids.device)
	text_position_ids = torch.zeros((1, input_ids.shape[1]), dtype=torch.long, device=input_ids.device)

Uh oh!

Conversation

zackcxb commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR scope

Checklist Before Starting

Test

API and Usage

Design & Code Changes

WIP / Follow-up

Checklist Before Submitting

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zackcxb commented Jun 12, 2026 •

edited

Loading