Skip to content

Fix: AttributeError: 'Stream' object has no attribute 'choices' and Real-Time Streaming#2145

Open
ramikhafagi96 wants to merge 4 commits into567-labs:mainfrom
ramikhafagi96:fix-tokens-streaming
Open

Fix: AttributeError: 'Stream' object has no attribute 'choices' and Real-Time Streaming#2145
ramikhafagi96 wants to merge 4 commits into567-labs:mainfrom
ramikhafagi96:fix-tokens-streaming

Conversation

@ramikhafagi96
Copy link
Copy Markdown

Fix streaming behavior in create() and create_partial()

Summary

This PR fixes two issues in the streaming path:

  1. create(stream=True) crashes with
    AttributeError: 'Stream' object has no attribute 'choices'.
  2. create_partial() buffers the entire stream into a list, preventing
    real-time streaming of partial models.

The changes enable true streaming for partial models and ensure
create(stream=True) works correctly.


Problems

1. create(stream=True) crash

When stream=True is used without Partial, the OpenAI Stream object
is passed directly to process_response().

Execution falls through to from_response(), which expects a
ChatCompletion object and accesses completion.choices. Since
Stream is an iterator, this results in:

AttributeError: 'Stream' object has no attribute 'choices'

2. create_partial() disables streaming

create_partial() wraps the model with Partial and sets
stream=True, but process_response() consumes the generator:

return list(
    response_model.from_streaming_response(response, mode=mode)
)

Because list() eagerly consumes the generator, the entire stream is
buffered before returning. As a result, partial models are not yielded
in real time.


Fix

Generator passthrough for partial streaming

Return the generator directly instead of wrapping it in list().

Before:

return list(
    response_model.from_streaming_response(response, mode=mode)
)

After:

return response_model.from_streaming_response(response, mode=mode)

This allows callers to iterate over partial models as they arrive.


Fallback for create(stream=True)

Added _accumulate_stream() to consume a raw Stream and construct a
synthetic ChatCompletion before passing it to from_response().

This prevents the Streamchoices crash.


Retry behavior with streaming

Retry logic is fundamentally incompatible with streaming partial
responses.

When create_partial() streams results, partial models may already be
yielded to the caller. If validation fails later in the stream, retrying
the request would require retracting previously yielded results, which
is not possible.

For this reason, when process_response() returns a generator
(streaming Partial case), retry_sync now returns it directly instead
of attempting retry logic.

This preserves existing retry behavior for non-streaming calls,
while allowing streaming generators to pass through unchanged.


Changes

response.py - Return generator for PartialBase + stream - Add
_accumulate_stream() helper - Add fallback handling for
create(stream=True)

retry.py - Add Generator import - Return generators directly in
retry_sync


Impact

  • create_partial() now streams partial models in real time
  • create(stream=True) no longer crashes
  • Async and non-streaming paths remain unchanged

Tested with instructor==1.14.1.

@ramikhafagi96 ramikhafagi96 changed the title Fix Streaming Fix: AttributeError: 'Stream' object has no attribute 'choices' and Real-Time Streaming Mar 17, 2026
@jxnl
Copy link
Copy Markdown
Collaborator

jxnl commented Mar 18, 2026

I don’t have time to take this one over right now. I’m not going to merge it as-is, but I’ll revisit the streaming design later when I can review the retry/reask semantics and add the missing coverage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants