Skip to content

Support OpenAI Responses incremental inputs (automatic previous_response_id chaining in the step loop) #16356

Description

@alvarosevilla95

Description

Background

The OpenAI Responses API offers a WebSocket mode whose main benefit is an incremental inputs fast path: instead of re-sending the full conversation each turn, you set previous_response_id to the prior response and put only the new items (tool outputs + next user message) in input. The server keeps the most recent response in a connection-local in-memory cache.

A WebSocket transport already exists (the ai-sdk-openai-websocket-fetch shim referenced in the docs), but it only delivers connection reuse — the incremental inputs fast path is still unreachable through the SDK, because:

  1. The automatic multi-step loop never chains previous_response_id. It is only populated from a caller-supplied providerOptions.openai.previousResponseId; the internal step loop captures response.id into result metadata but does not feed it back into the next request. So every step resends the full accumulated input.

Adding previous_response_id at the transport/fetch layer is unsafe given (1): the server would prepend the cached prior turn and receive the full history → duplicated context. Correct incremental inputs require the SDK to both set previous_response_id and trim input to only-new-items, which only the SDK is positioned to do reliably (it owns item transformation: function_call id mapping, reasoning / encrypted-reasoning items, ordering).

Current behaviour

  • @ai-sdk/openai responses model sends the full message list on every step.
  • previous_response_id is set only from providerOptions.openai.previousResponseId
  • Using the WebSocket fetch shim therefore yields connection reuse but not incremental inputs.

Desired behaviour

An opt-in mode where the automatic step loop chains previous_response_id and sends only new items (the streaming + tool-call loop being the primary win), so the WebSocket incremental-inputs path is actually usable.

Please also consider the reconnect semantics: with store: false / ZDR, a dropped socket (idle timeout, the 60-minute connection cap, or a transport error) invalidates the in-memory chain and an uncached previous_response_id returns previous_response_not_found. Any chaining mode should fall back to resending full context with previous_response_id: null on that error.

Use case

Lower-latency multi-step tool loops and long conversations. Using the WebSocket fetch shim we can realize connection-reuse savings, but not incremental inputs, because the SDK resends full input and doesn't chain previous_response_id.

Related

AI SDK Version

  • ai@6.0.208
  • ai-sdk/openai@3.0.74

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions