Skip to content

fix: add request timeout to prevent infinite hangs during scoring loop#56

Open
TristanSchneider-dev wants to merge 1 commit into
ISG-Siegen:developfrom
leonlenz:fix/random-freeze
Open

fix: add request timeout to prevent infinite hangs during scoring loop#56
TristanSchneider-dev wants to merge 1 commit into
ISG-Siegen:developfrom
leonlenz:fix/random-freeze

Conversation

@TristanSchneider-dev

@TristanSchneider-dev TristanSchneider-dev commented Jun 8, 2026

Copy link
Copy Markdown

Add request timeout to prevent infinite hangs during scoring loop

Midway through the pipeline, I noticed that it got stuck indefinitely because the API stopped responding. :(

Problem

The pipeline can hang indefinitely when the OpenAI API server stops responding midrequest.

Error flow

During the detailed requirement scoring loop in minimal_agent.py, every requirement triggers a separate API call. Earlier logs show a 503 Service Unavailable that eventually succeeded after retry. At some point the API connection stalled silently ? the traceback shows httpcore blocked on self._protocol.read_event.wait() with no timeout configured on ChatOpenAI. The asyncio.exceptions.CancelledError only appeared after a manual KeyboardInterrupt; without that the process hangs forever.

Fix

Added timeout and max_retries to the ChatOpenAI constructor in treesearch/llm/query.py:

_cfg = get_config()
model = ChatOpenAI(
    model=self._model,
    temperature=self._temperature,
    use_responses_api=True,
    timeout=_cfg.agent.code.request_timeout,   # 120 seconds
    max_retries=_cfg.agent.code.max_retries,    # 3 retries
)

Parameters are reused from config.toml

Parameter Value Description
request_timeout 120s Timeout per API call
max_retries 3 Automatic retry attempts

Why it works..

A x-second timeout ensures unresponsive calls raise a httpx.TimeoutException instead of hanging forever.
The exception is caught by the existing try/except blocks in minimal_agent.py, which mark the requirement as is_fulfilled=False and continue the pipeline gracefully.
3 automatic retries handle transient failures (like the observed 503) without manual intervention.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant