[AgentGateway] feat: Multiple Linear Chains for Gateway Sessions#66
[AgentGateway] feat: Multiple Linear Chains for Gateway Sessions#66zhanghuiyao wants to merge 5 commits into
Conversation
…eway sessions - Introduced `enable_multiple_chains` configuration option in GatewayActorConfig to toggle multiple chain support. - Updated GatewayManager and GatewayActor to handle multiple chains. - Enhanced GatewaySession to manage active chains, including selection, materialization, and closure of chains. - Implemented chain state management with ChainState and MaterializedChain data classes. - Refactored message encoding and response handling to accommodate multiple chains. - Added methods for computing message prefix hashes and managing chain compatibility.
There was a problem hiding this comment.
Code Review
This pull request introduces support for managing multiple active linear trajectory chains per gateway session, allowing for branching conversations (e.g., subagents) within a single session. The changes span the framework configuration, gateway actor, and session management, backed by a comprehensive suite of unit tests. The feedback identifies a potential issue where the calculated remaining response budget could become negative if the generated response exceeds the budget, which may cause LLM backends to crash; clamping this value to a minimum of zero is recommended.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
…ro and update MessageCodec for dynamic system prompts
- Removed enable_multiple_chains parameter from session and gateway configurations. - Simplified GatewaySession to handle multiple chains without conditional checks. - Updated MessageCodec to derive system prompts from processors when available. - Consolidated trajectory handling to streamline the materialization of active chains. - Enhanced tests for multiple chains to ensure consistent behavior and coverage. - Removed legacy code related to single chain handling, improving clarity and maintainability.
- Introduced `enable_parallel_session_generation` and `ignore_cch_for_prefix_hash` flags in GatewayActorConfig to control session behavior. - Updated GatewaySession to handle parallel backend generation and manage inflight generations. - Enhanced tests to cover new functionality, including concurrent requests and validation of boolean flags. - Modified message prefix hash computation to optionally ignore CCH markers for prefix comparisons. - Ensured backward compatibility while implementing new features.
…n decode failures
Summary
Implements per-session multiple linear chains for
GatewaySession. A gateway session no longer uses one globalmessage_historyplus one active trajectory as the runtime source of truth. Instead, it keeps multiple append-only linear chains so sub-agent/system splits, context compaction, repeated prompts, and best-of-N / retry-style flows do not overwrite unrelated trajectory branches.In the current PR head, multiple chains are the default and only
GatewaySessionstate model. Live state is stored inactive_chains, closed state is stored inmaterialized_chains, and there is noGatewayActorConfig.enable_multiple_chainsrollout flag. The PR also includes the M2 opt-in follow-ups that are now implemented in this branch:enable_parallel_session_generation=Falsefor same-session backend concurrency experiments, andignore_cch_for_prefix_hash=Falsefor Claude Codecch=prefix-hash compatibility. M3 branch identity and idempotency remain design-only and are not implemented here.What's in this PR
Default multiple-chain state model
ChainStatefor one active linear chain with its own message history, stable prefix hashes, tool/schema compatibility state, effective chat-template kwargs, trajectory buffer, media state, logprob-completeness state, and sequence metadata.MaterializedChainso early-closed chains and still-active chains can be sorted together at finalize.active_chainsandmaterialized_chainsas the authoritative session model.Chain selection, lifecycle, and materialization
(updated_seq, chain_id)tie-breaks.chat_template_kwargs, and canonical message-prefix hash matching.order_seq.0before applying requestmax_tokens.response_logprobsis either aligned withresponse_idsor emitted asNone.Prefix matching and codec compatibility
MessageCodec.canonicalize_message_for_prefix_comparison()for both incoming request messages and committed assistant messages.tool_calls[].idandtool_call_idin prefix-hash semantics, so rewriting committed tool-call ids splits instead of reusing an unsafe buffer.MessageCodec.effective_chat_template_kwargs()and routes full/incremental encoding through the same effective-kwargs merge rule.M2 opt-in same-session backend concurrency
enable_parallel_session_generation=Falseas an explicit opt-in.generation_lock.request_lock, while backend generation runs without holding a session lock.generation_idand sends backendrequest_idassession_id:generation_id, avoiding same-session backend request-id collisions._inflight_generationsfor snapshot/debug visibility.M2 opt-in
cch=prefix-hash compatibilityignore_cch_for_prefix_hash=False.cch=[A-Za-z0-9_-]+string leaves for prefix-hash input when explicitly enabled.Framework / reward / TQ behavior
Trajectoryand TransferQueue schemas unchanged.Trajectoryschema._score_trajectories()scoressession_trajectories[-1]and broadcasts that score to all trajectories from the session.session_trajectories[-1]timing-dependent, so the parallel flag should only be used when concurrent sibling chains are meant to share the same broadcast reward.Actor / manager wiring and observability
_GatewayActor.create_session()forwards prompt/response budgets plus M2 flags intoGatewaySession.build_gateway_manager()readsactor_rollout_ref.rollout.custom.agent_framework.enable_parallel_session_generationandignore_cch_for_prefix_hash.snapshot_state()reportsnum_active_chains,active_chain_ids, active chain tip hashes, and in-flight generation ids for live-session inspection.Tests (CPU-only)
tests/uni_agent/gateway/test_session_multiple_chains_on_cpu.pycch=normalization.tests/uni_agent/gateway/test_gateway_config_on_cpu.pyGatewayActorConfig.tests/uni_agent/gateway/test_gateway_actor_on_cpu.pytests/uni_agent/gateway/test_gateway_manager_on_cpu.pytests/uni_agent/framework/test_generate_sequences_on_cpu.pyRepresentative local validation recorded for this PR head:
Design
The M1 goal of this design is to transform GatewaySession into a unique state model consisting of multiple append-only linear chains, while maintaining the training data plane and OpenAI-compatible HTTP envelope unchanged. The core principle is: only reuse the token buffer of a chain if it can be proven that the incoming history is a safe continuation of that chain; otherwise, create a new chain and retain the old chain until finalize.
The M2 goal of this design is to allow multiple backend generations within the same gateway session to run concurrently as an opt-in mode on top of M1's multiple-chain state model, while keeping chain selection, commit, decode, and accepted-outcome ordering boundaries verifiable.
Prefix matching strategy, eligibility gates, and hash strategy
M1 uses a "longest continuable chain" selection rule. For each active chain, the session first checks
active_tool_schemas == tools, then checks that the request's effectivechat_template_kwargsequals the chain's effective kwargs, and finally checks whether the incoming normalized message-history prefix hash equals the chain tip hash. Among all matching chains, the chain with the longestmessage_historywins. If multiple chains have the same length,(updated_seq, chain_id)is used as the deterministic tie-break. If no active chain matches, the request starts a new chain.Prefix hashes describe only whether the message history is the same prefix. Tools and effective chat-template kwargs are kept as separate compatibility gates and are not mixed into the message hash. Hash input is the output of
MessageCodec.normalize_request(), canonicalized byMessageCodec.canonicalize_message_for_prefix_comparison(), serialized with deterministic JSON, and hashed with SHA-256. Prefix hashes are cumulative over message hashes. Assistant messages committed after backend generation use the same canonicalization path, so a later request that echoes the committed assistant message can reattach to the same chain. M1 does not use Python's built-inhash(),repr(), or process-randomized state.Backend rollout and incremental-token strategy
When a request continues an existing chain, prepare clones that chain's
TrajectoryBufferplus chain-local media and logprob state. It does not mutate the active chain directly. The session computesincremental_messages = messages[len(chain.message_history):], encodes those messages withencode_incremental(), and after the length-budget check appends the incremental token ids to the working buffer withresponse_mask=0, because those tokens are continuation context rather than this turn's generated answer.The rollout backend still receives
buffer.prompt_ids + buffer.response_idsas context. After the backend succeeds, the session appends backend output tokens withresponse_mask=1, decodes the assistant message, and commits underrequest_lock. New chains are allocated and appended; existing chains are replaced in place only if the selected chain is still the same live tip. Backend failure keeps commit-on-success semantics: no chain is modified and no partial trajectory is created.Boundary conditions and chain preservation
Tool, permission, or template changes must not silently reuse an old token buffer. M1 explicitly gates reuse by tool schemas and effective
chat_template_kwargs; any caller-side change that affects available tools, system policy, or prompt rendering must show up in those compatibility states. If tools or effective kwargs are incompatible, a request starts a new chain even if the message history looks like a prefix.In subagent/system-split flows, a subagent with a different system prompt or tool context creates an independent chain. When the caller returns to the main history, the incoming history can match the main chain by prefix hash and continue it, instead of reopening main as a third trajectory. Context compaction or history rewriting that is not a prefix continuation of any active chain also starts a new chain; the previous chains remain available for finalize.
Repeated same-prompt, retry, and best-of-N flows can create sibling chains. If different samples produce different assistant messages, a later request that faithfully echoes one assistant message can select that sibling exactly through the prefix hash. Ambiguity only remains for canonically identical tips or callers that do not provide enough history or branch identity; M1 handles those cases deterministically with the latest-updated tie-break. M1 does not implement idempotent deduplication or explicit branch selection. Length exhaustion closes only the selected continuation chain, does not append unsent incremental tokens or media, and leaves other active chains alive. Finalize sorts early-closed and active chains together by
order_seq, preserving the invariant that the last visible session interaction is returned atsession_trajectories[-1].M2 compatibility opt-in for Claude Code
cch=billing-hash churnanthropics/claude-code#40652 describes a Claude Code behavior where historical tool-result text containing
cch=<value>billing hashes can be rewritten, which changes historical message bytes and can break prompt-cache prefix matching. M1 treats any historical message-content change as semantic and therefore starts a new chain.M2 adds an explicit opt-in compatibility flag,
ignore_cch_for_prefix_hash, defaulting toFalse. When enabled, prefix-hash canonicalization normalizes only string leaves matchingcch=[A-Za-z0-9_-]+to a fixed placeholder before deterministic JSON and SHA-256. This does not rewrite the raw request payload, storedmessage_history, trajectory token buffers, TQ fields, or HTTP responses. It only means the caller acceptscch=value churn as non-semantic for chain-selection purposes; any non-cch=content change still splits into a new chain.Scope / deferrals
Trajectory/ TQ schema change: multiple chains materialize as multiple existing trajectories.enable_multiple_chainsfeature flag: multiple chains are the default and only session state model in this branch.enable_parallel_session_generation=Trueand should be used only when timing-dependent final-target reward semantics are acceptable.BranchHints, nouni_agent_parent_chain_id, nouni_agent_parent_tip_hash, and nouni_agent_request_idbehavior in this PR.Trajectory.extra_fields.Remaining future work (M3 / framework follow-up)