Skip to content

Add Realtime Reasoning API support for gpt-realtime-2#284

Open
richarddas wants to merge 1 commit into
AIProxyTeam:mainfrom
richarddas:feature/realtime-reasoning-api-parity
Open

Add Realtime Reasoning API support for gpt-realtime-2#284
richarddas wants to merge 1 commit into
AIProxyTeam:mainfrom
richarddas:feature/realtime-reasoning-api-parity

Conversation

@richarddas

Copy link
Copy Markdown
Contributor

Summary

Adds first-class support for OpenAI Realtime Reasoning models such as gpt-realtime-2, while keeping existing Performance Realtime call sites (gpt-realtime-1.5, etc.) unchanged.

  • New types: OpenAIRealtimeReasoningConfiguration, OpenAIRealtimeReasoningSessionConfiguration, OpenAIRealtimeReasoningResponseCreate
  • OpenAIService.realtimeSession overload for Reasoning session configuration
  • Wire encoding merges reasoning.effort and parallel_tool_calls into the existing session.update / response.create session payload
  • Decodes phased Realtime output (commentary vs final_answer) on response.done, output item events, and conversation item events
  • README examples for gpt-realtime-2 and phased responses; schema matrix at Documentation/OpenAI/RealtimeSchemaMatrix.md
  • Encoding/decoding tests for Reasoning fields and phases, plus Performance regression tests

Why

Callers using reasoning voice models need reasoning.effort and parallel_tool_calls on session and response-create, and need to handle phased output items. Performance Realtime usage should remain the same API surface with no new required parameters.

Test plan

  • On-device smoke test with gpt-realtime-2 (session connect, Reasoning configuration, phased output)
  • swift test --filter OpenAIRealtime
  • swift test

Migration notes

Performance (unchanged):

let session = try await openAIService.realtimeSession(
    model: "gpt-realtime-1.5",
    configuration: .init(),
    logLevel: .info
)

Reasoning (new):

let session = try await openAIService.realtimeSession(
    model: "gpt-realtime-2",
    configuration: OpenAIRealtimeReasoningSessionConfiguration(
        session: OpenAIRealtimeSessionConfiguration(
            outputModalities: [.audio],
            voice: .builtin("alloy")
        ),
        reasoning: .init(effort: .low),
        parallelToolCalls: true
    ),
    logLevel: .info
)

Reasoning session configuration requires an explicit session: argument so existing configuration: .init() call sites continue to resolve to Performance configuration without ambiguity.

Made with Cursor

Includes Realtime Reasoning session and response-create types for reasoning effort and parallel tool calls while preserving existing Performance Realtime call sites.

Decodes phased Realtime output for commentary and final answer items across response completion, output item, and conversation item events.

Documents the current Realtime schema mapping and README examples, and removes obsolete Realtime GA/beta terminology.

Adds focused encoding and decoding tests for the new wire shapes and compatibility behavior.

Co-authored-by: Cursor <cursoragent@cursor.com>
@richarddas

Copy link
Copy Markdown
Contributor Author

Also removed references to "GA" from the previous work, since OpenAI have closed the beta wire as of May 12, 2026.

@lzell lzell self-assigned this Jun 3, 2026
@richarddas

Copy link
Copy Markdown
Contributor Author

Looking at this a couple days later with fresh eyes, I think there are still improvements to be made to ergonomics. But I have smoke tested this on device, and it's functional.

//

/// `response.create` for Realtime Reasoning models.
nonisolated public struct OpenAIRealtimeReasoningResponseCreate: Encodable {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if we need a new response.create versus updating the existing OpenAIRealtimeResponseCreatedEvent. Right now that type has a responseID? on it, but it seems like we could put a Response? on it too

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, disregard. I confused response.create with response.created

//

/// Configuration for OpenAI Realtime Reasoning models such as `gpt-realtime-2`.
nonisolated public struct OpenAIRealtimeReasoningConfiguration: Encodable, Sendable {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main thing I'd like to understand before merging is if we need this separate ReasoningConfiguration, and separate initializer in OpenAIRealtimeSession. IIUC, a more surgical change would be to modify OpenAIRealtimeSessionConfiguration by adding a member: let reasoning: OpenAIRealtimeReasoning?.

The OpenAIRealtimeReasoning type would have a single member, effort, much like your current type OpenAIRealtimeReasoningConfiguration.

I don't see any real control flow or network sequencing differences between reasoning and non-reasoning versions right now, so I think this would be a simpler change. Let me know if I'm missing something @richarddas

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And a nit: For any new types that you do create, can you use one file per public type and pull them into a new folder OpenAI/Realtime (you can see the existing example of OpenAI/Conversations). I want to start organizing up realtime files for our eventual split of this repo into several single purpose clients. That will make the work down the road a bit easier

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original thinking was around keeping Performance and Reasoning explicit at the callsite, but you make a solid point that the rest of the sequencing collapses the two anyway. Since models are provided as strings, the wrapper also doesn’t actually enforce that gpt-realtime-2 uses the Reasoning config. So the wrapper is probably overkill.

I’ll fold reasoning and parallelToolCalls into the existing session and response-create types, while keeping reasoning as a grouped value so the Reasoning intent is still explicit at the callsite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants