Description
Background
The OpenAI Responses API offers a WebSocket mode whose main benefit is an incremental inputs fast path: instead of re-sending the full conversation each turn, you set previous_response_id to the prior response and put only the new items (tool outputs + next user message) in input. The server keeps the most recent response in a connection-local in-memory cache.
A WebSocket transport already exists (the ai-sdk-openai-websocket-fetch shim referenced in the docs), but it only delivers connection reuse — the incremental inputs fast path is still unreachable through the SDK, because:
- The automatic multi-step loop never chains
previous_response_id. It is only populated from a caller-supplied providerOptions.openai.previousResponseId; the internal step loop captures response.id into result metadata but does not feed it back into the next request. So every step resends the full accumulated input.
Adding previous_response_id at the transport/fetch layer is unsafe given (1): the server would prepend the cached prior turn and receive the full history → duplicated context. Correct incremental inputs require the SDK to both set previous_response_id and trim input to only-new-items, which only the SDK is positioned to do reliably (it owns item transformation: function_call id mapping, reasoning / encrypted-reasoning items, ordering).
Current behaviour
@ai-sdk/openai responses model sends the full message list on every step.
previous_response_id is set only from providerOptions.openai.previousResponseId
- Using the WebSocket fetch shim therefore yields connection reuse but not incremental inputs.
Desired behaviour
An opt-in mode where the automatic step loop chains previous_response_id and sends only new items (the streaming + tool-call loop being the primary win), so the WebSocket incremental-inputs path is actually usable.
Please also consider the reconnect semantics: with store: false / ZDR, a dropped socket (idle timeout, the 60-minute connection cap, or a transport error) invalidates the in-memory chain and an uncached previous_response_id returns previous_response_not_found. Any chaining mode should fall back to resending full context with previous_response_id: null on that error.
Use case
Lower-latency multi-step tool loops and long conversations. Using the WebSocket fetch shim we can realize connection-reuse savings, but not incremental inputs, because the SDK resends full input and doesn't chain previous_response_id.
Related
AI SDK Version
- ai@6.0.208
- ai-sdk/openai@3.0.74
Code of Conduct
Description
Background
The OpenAI Responses API offers a WebSocket mode whose main benefit is an incremental inputs fast path: instead of re-sending the full conversation each turn, you set
previous_response_idto the prior response and put only the new items (tool outputs + next user message) ininput. The server keeps the most recent response in a connection-local in-memory cache.A WebSocket transport already exists (the
ai-sdk-openai-websocket-fetchshim referenced in the docs), but it only delivers connection reuse — the incremental inputs fast path is still unreachable through the SDK, because:previous_response_id. It is only populated from a caller-suppliedproviderOptions.openai.previousResponseId; the internal step loop capturesresponse.idinto result metadata but does not feed it back into the next request. So every step resends the full accumulatedinput.Adding
previous_response_idat the transport/fetch layer is unsafe given (1): the server would prepend the cached prior turn and receive the full history → duplicated context. Correct incremental inputs require the SDK to both setprevious_response_idand triminputto only-new-items, which only the SDK is positioned to do reliably (it owns item transformation:function_callid mapping, reasoning / encrypted-reasoning items, ordering).Current behaviour
@ai-sdk/openairesponses model sends the full message list on every step.previous_response_idis set only fromproviderOptions.openai.previousResponseIdDesired behaviour
An opt-in mode where the automatic step loop chains
previous_response_idand sends only new items (the streaming + tool-call loop being the primary win), so the WebSocket incremental-inputs path is actually usable.Please also consider the reconnect semantics: with
store: false/ ZDR, a dropped socket (idle timeout, the 60-minute connection cap, or a transport error) invalidates the in-memory chain and an uncachedprevious_response_idreturnsprevious_response_not_found. Any chaining mode should fall back to resending full context withprevious_response_id: nullon that error.Use case
Lower-latency multi-step tool loops and long conversations. Using the WebSocket fetch shim we can realize connection-reuse savings, but not incremental inputs, because the SDK resends full
inputand doesn't chainprevious_response_id.Related
providerOptionswhile staying on thepreviousResponseIdchain).AI SDK Version
Code of Conduct