Skip to content

Ticket [#26265] | Streaming responses API crashes on response.keep_alive heartbeat events (SDK 0.9.1) #266

@nachogarcia

Description

@nachogarcia

Summary

The openrouter Python SDK (latest: 0.9.1) fails the entire SSE stream when OpenRouter emits a response.keep_alive heartbeat event on the POST /responses (streaming) endpoint. The StreamEvents discriminated union does not include a response.keep_alive member, so pydantic raises union_tag_invalid inside _parse_event. Because the failure happens inside the async generator stream_events_async, the generator is terminated and any subsequent content events (deltas, response.completed, etc.) are lost. From a caller's perspective, the request just dies mid-stream with a ValidationError.

This affects every streaming responses call that lasts long enough to receive a heartbeat (we see it both on openai/* and google/* model routes), so it's not model-specific — it's the OpenRouter proxy layer emitting the keep-alive.

Environment

  • openrouter==0.9.1 (latest on PyPI; __gen_version__ = "2.788.4")
  • Python 3.13
  • pydantic==2.13.x
  • Endpoint: client.beta.responses.send_async(..., stream=True) returning text/event-stream

Stack trace (verbatim from production)

File ".../openrouter/utils/eventstreaming.py", line 124, in stream_events_async
    event, discard = _parse_event(block, decoder, sentinel)
File ".../openrouter/utils/eventstreaming.py", line 235, in _parse_event
    out = decoder(json.dumps(event.__dict__))
File ".../openrouter/responses.py", line 1243, in <lambda>
    lambda raw: utils.unmarshal_json(
        raw, operations.CreateResponsesResponseBody
    ).data,
File ".../openrouter/utils/serializers.py", line 140, in unmarshal_json
    return unmarshal(from_json(raw), typ)
File ".../openrouter/utils/serializers.py", line 150, in unmarshal
    m = unmarshaller(body=val)
File ".../pydantic/main.py", line 263, in __init__
    validated_self = self.__pydantic_validator__.validate_python(
        data, self_instance=self
    )
pydantic_core._pydantic_core.ValidationError: 1 validation error for Unmarshaller
body.data
  Input tag 'response.keep_alive' found using <lambda>() does not match any
  of the expected tags: 'error', 'response.completed',
  'response.content_part.added', 'response.content_part.done',
  'response.created', 'response.failed',
  'response.function_call_arguments.delta',
  'response.function_call_arguments.done',
  'response.image_generation_call.completed',
  'response.image_generation_call.generating',
  'response.image_generation_call.in_progress',
  'response.image_generation_call.partial_image', 'response.in_progress',
  'response.incomplete', 'response.output_item.added',
  'response.output_item.done', 'response.output_text.annotation.added',
  'response.output_text.delta', 'response.output_text.done',
  'response.reasoning_summary_part.added',
  'response.reasoning_summary_part.done',
  'response.reasoning_summary_text.delta',
  'response.reasoning_summary_text.done', 'response.reasoning_text.delta',
  'response.reasoning_text.done', 'response.refusal.delta',
  'response.refusal.done', 'response.web_search_call.completed',
  'response.web_search_call.in_progress',
  'response.web_search_call.searching'
  [type=union_tag_invalid, input_value={'type': 'response.keep_alive'}, input_type=dict]

Root cause

The union StreamEvents defined in openrouter/components/streamevents.py (Speakeasy-generated, lines ~138–204) does not list response.keep_alive as a discriminated member. The proxy now emits this event type for SSE heartbeats. The union's Discriminator is strict, so any unknown type tag raises immediately and aborts the generator.

This is consistent with OpenAI's Responses API keep-alive behavior, but the OpenRouter spec/SDK has not been updated to model it.

Reproduction

Any long-running streamed response will hit it once a heartbeat lands. A minimal repro:

import asyncio
from openrouter import OpenRouter

async def main():
    client = OpenRouter(api_key="...")
    stream = await client.beta.responses.send_async(
        model="openai/gpt-5.5",
        input=[{"role": "user", "content": "Write a 2000-word essay about ..."}],
        stream=True,
    )
    async with stream:
        async for event in stream:
            print(getattr(event, "type", None))

asyncio.run(main())
# -> pydantic_core._pydantic_core.ValidationError ... 'response.keep_alive'

Impact

For us: ~10–35 stream failures per hour on production chatbot endpoints. End users see "failed to generate" with no recovery. We can't catch the exception and resume the stream because the generator is already terminated — we lose any subsequent content events.

Suggested fix

Two options, listed by preference:

  1. Add response.keep_alive to the StreamEvents discriminated union. Model it as a no-op event whose only field is type: Literal["response.keep_alive"]. Callers that don't care can ignore it; callers that care can implement their own connection-health logic.

  2. Make _parse_event resilient to unknown event tags. Wrap the decoder(...) call in a try/except for pydantic.ValidationError with union_tag_invalid, log/discard, and continue. This protects against future unmodelled events the proxy may emit, but loses type-safety.

(1) is the right answer; (2) is a defensive backstop and would have prevented this bug from being a hard failure in the first place.

Workaround we're applying meanwhile

Monkey-patching openrouter.utils.eventstreaming._parse_event at import time to swallow union_tag_invalid errors whose tag starts with response. and return (None, False) instead of raising. Happy to share the patch if useful.

References

  • Affected file: openrouter/utils/eventstreaming.py (lines 124, 235)
  • Affected union: openrouter/components/streamevents.py (StreamEvents)
  • Decoder definition: openrouter/responses.py:1243

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions