Skip to content

memory_compressor enters infinite litellm-timeout retry loop on long-context runs #470

@MagVeTs

Description

@MagVeTs

Summary

On long-running multi-target Strix scans (≈3+ hours wall-clock, with the agent producing roughly 10 findings before the freeze occurs), strix.llm.memory_compressor._summarize_messages() calls into litellm with the default 30-second request timeout. Once the agent's accumulated context exceeds roughly 150 KB, the summarisation request to Anthropic regularly takes longer than 30 seconds. litellm raises litellm.exceptions.Timeout, the agent retries the same call, and the cycle repeats indefinitely with no backoff and no terminal failure. The Strix process and Docker sandbox stay alive, CPU drops to ~1%, no new findings are written, and the scan never completes.

The agent had produced roughly 10 findings before the freeze and was healthy until that point — this is a fail-stuck mode rather than an in-flight crash. The only externally-visible symptom is "log file stops growing for 30+ minutes."

Reproduction

  1. Strix 0.8.2, Docker sandbox image ghcr.io/usestrix/strix-sandbox:0.1.12, host macOS 25.3.0 (Apple Silicon).
  2. ~/.strix/cli-config.json configured for Anthropic:
    {
      "env": {
        "STRIX_LLM": "anthropic/claude-sonnet-4-6",
        "LLM_API_KEY": "<anthropic-key>"
      }
    }
  3. Launch a multi-target standard-mode scan against four mid-size public web applications:
    strix \
      --target https://target1.example \
      --target https://target2.example \
      --target https://target3.example \
      --target https://target4.example \
      --scan-mode standard \
      --instruction-file instructions.txt \
      -n
  4. Let the scan run for ~3 hours. After the agent has produced roughly 10 findings (during the third hour), the log stops growing.

Expected: the scan continues to produce findings until completion, or fails terminally with a clear error.

Actual: the log file freezes at the end of the most recent finding's writeup. The Strix process stays alive (CPU ≈ 1%, memory steady at ~2 GB / 8 GB available). The Docker sandbox container stays "Up". Re-checking the log every few minutes shows the same Python traceback being repeated:

Traceback (most recent call last):
  File "httpx/_client.py", line 1014, in _send_single_request
  File "httpx/_transports/default.py", line 249, in handle_request
  File "contextlib.py", line 158, in __exit__
  File "httpx/_transports/default.py", line 118, in map_httpcore_exceptions
httpx.ReadTimeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "litellm/llms/anthropic/chat/handler.py", line 446, in completion
  File "litellm/llms/custom_httpx/http_handler.py", line 964, in post
litellm.exceptions.Timeout: litellm.Timeout: Connection timed out after 30.0 seconds.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "litellm/main.py", line 2665, in completion
  File "litellm/llms/anthropic/chat/handler.py", line 461, in completion
litellm.llms.anthropic.common_utils.AnthropicError: litellm.Timeout: Connection timed out after 30.0 seconds.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "strix/llm/memory_compressor.py", line 120, in _summarize_messages
  File "litellm/utils.py", line 1739, in wrapper
  File "litellm/utils.py", line 1560, in wrapper
  File "litellm/main.py", line 4205, in completion
  File "litellm/litellm_core_utils/exception_mapping_utils.py", line 2356, in exception_type
  File "litellm/litellm_core_utils/exception_mapping_utils.py", line 672, in exception_type
litellm.exceptions.Timeout: litellm.Timeout: AnthropicException - litellm.Timeout: Connection timed out after 30.0 seconds.

In my run the same traceback repeated for 1h 27m before the process was manually killed. There is no observable retry-count limit and no exponential backoff between retries.

The Anthropic API itself was reachable from the host throughout — a separate curl https://api.anthropic.com/v1/messages handshake completed in 0.19s. The timeout is on the specific large-payload request from inside the agent, not network-level connectivity.

Diagnosis

The default litellm request timeout is 30 seconds. Anthropic's response time on a context-summarisation request scales with input-token count; once the agent's accumulated conversation history is large enough, the summarisation request reliably takes longer than 30 seconds. _summarize_messages catches the resulting litellm.exceptions.Timeout and retries — but the retry uses the same 30-second timeout against the same large input, so it fails again, indefinitely.

The fix has three parts:

  1. Raise the default timeout for memory_compressor calls. 180–300 seconds is more appropriate than 30 for a request whose whole purpose is to chew through a long context. This can be done by passing timeout=300 (or whatever the agent considers the maximum acceptable wait) to the litellm completion() call inside _summarize_messages.
  2. Add exponential backoff between retries so a transient Anthropic slowdown can recover. Even with a higher timeout, a single large summarisation call can occasionally exceed the limit; with backoff, the next retry has a chance of landing.
  3. Add a retry budget. After (say) 5 consecutive timeouts, raise a terminal error and exit non-zero, so the wrapper process / scheduler can surface a clean failure instead of hanging forever.

Suggested patch

The simplest patch is in strix/llm/memory_compressor.py around line 120 (_summarize_messages):

# Pseudocode; exact API depends on litellm version
import time

MEMORY_COMPRESSOR_TIMEOUT_SEC = int(os.environ.get("STRIX_MEMORY_COMPRESSOR_TIMEOUT", 300))
MAX_ATTEMPTS = 5
BASE_BACKOFF_SEC = 5

def _summarize_messages(self, messages):
    for attempt in range(MAX_ATTEMPTS):
        try:
            return litellm.completion(
                model=self.model,
                messages=messages,
                timeout=MEMORY_COMPRESSOR_TIMEOUT_SEC,
                # ...other args
            )
        except litellm.exceptions.Timeout:
            if attempt == MAX_ATTEMPTS - 1:
                raise
            sleep_for = BASE_BACKOFF_SEC * (2 ** attempt)
            logger.warning(
                "memory_compressor litellm timeout (attempt %d/%d); "
                "backing off %ds before retry",
                attempt + 1, MAX_ATTEMPTS, sleep_for,
            )
            time.sleep(sleep_for)

A simpler interim fix that doesn't change the retry logic: read LITELLM_REQUEST_TIMEOUT from the environment and apply it everywhere litellm is called inside Strix. litellm already supports this env var globally, so a one-line change at the top of memory_compressor.py (or wherever the litellm client is constructed) to honour os.environ.get("LITELLM_REQUEST_TIMEOUT", 30) would let users raise the timeout without rebuilding Strix.

Workaround until a fix lands

  1. Always launch Strix with LITELLM_REQUEST_TIMEOUT=300 in the environment.
  2. Run a periodic health-check that watches the scan log for staleness; on ≥30 min stale + this exact traceback signature, surface to the user and ask whether to kill cleanly. Findings already written to vulnerabilities/vuln-*.md are usable independently of clean termination.
  3. Never auto-kill — always confirm with the user first, since the agent could in principle recover during a long Anthropic slow-down.

This works around the symptom but doesn't fix the underlying retry loop, which is why I'm filing this issue.

Environment

  • Strix: 0.8.2 (have not yet retested on 0.8.3; happy to do so if a fix is already in flight)
  • Sandbox image: ghcr.io/usestrix/strix-sandbox:0.1.12
  • LLM: anthropic/claude-sonnet-4-6 via litellm
  • Host: macOS 25.3.0, Apple Silicon, 16 GB RAM
  • Docker Desktop running, ~7.65 GiB allocated to the sandbox
  • Reproduced on a 4-target standard-mode multi-target run, ~3h wall-clock to onset of freeze, roughly 10 findings produced before freeze

Severity (in my view)

Medium-to-high for any workflow that runs Strix on long multi-target standard-mode scans without manual babysitting. The findings already on disk are usable, so this isn't a data-loss bug — but it is a workflow killer for autonomous runs, and it consumes Anthropic tokens during the retry loop without producing value.

Happy to share more of the scan log on request, run repro on 0.8.3 if a fix has already shipped there, or test a candidate patch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions