memory_compressor enters infinite litellm-timeout retry loop on long-context runs

## Summary

On long-running multi-target Strix scans (≈3+ hours wall-clock, with the agent producing roughly 10 findings before the freeze occurs), `strix.llm.memory_compressor._summarize_messages()` calls into litellm with the default 30-second request timeout. Once the agent's accumulated context exceeds roughly 150 KB, the summarisation request to Anthropic regularly takes longer than 30 seconds. litellm raises `litellm.exceptions.Timeout`, the agent retries the same call, and the cycle repeats indefinitely with no backoff and no terminal failure. The Strix process and Docker sandbox stay alive, CPU drops to ~1%, no new findings are written, and the scan never completes.

The agent had produced roughly 10 findings before the freeze and was healthy until that point — this is a fail-stuck mode rather than an in-flight crash. The only externally-visible symptom is "log file stops growing for 30+ minutes."

## Reproduction

1. Strix `0.8.2`, Docker sandbox image `ghcr.io/usestrix/strix-sandbox:0.1.12`, host macOS 25.3.0 (Apple Silicon).
2. `~/.strix/cli-config.json` configured for Anthropic:
   ```json
   {
     "env": {
       "STRIX_LLM": "anthropic/claude-sonnet-4-6",
       "LLM_API_KEY": "<anthropic-key>"
     }
   }
   ```
3. Launch a multi-target standard-mode scan against four mid-size public web applications:
   ```bash
   strix \
     --target https://target1.example \
     --target https://target2.example \
     --target https://target3.example \
     --target https://target4.example \
     --scan-mode standard \
     --instruction-file instructions.txt \
     -n
   ```
4. Let the scan run for ~3 hours. After the agent has produced roughly 10 findings (during the third hour), the log stops growing.

Expected: the scan continues to produce findings until completion, or fails terminally with a clear error.

Actual: the log file freezes at the end of the most recent finding's writeup. The Strix process stays alive (CPU ≈ 1%, memory steady at ~2 GB / 8 GB available). The Docker sandbox container stays "Up". Re-checking the log every few minutes shows the same Python traceback being repeated:

```
Traceback (most recent call last):
  File "httpx/_client.py", line 1014, in _send_single_request
  File "httpx/_transports/default.py", line 249, in handle_request
  File "contextlib.py", line 158, in __exit__
  File "httpx/_transports/default.py", line 118, in map_httpcore_exceptions
httpx.ReadTimeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "litellm/llms/anthropic/chat/handler.py", line 446, in completion
  File "litellm/llms/custom_httpx/http_handler.py", line 964, in post
litellm.exceptions.Timeout: litellm.Timeout: Connection timed out after 30.0 seconds.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "litellm/main.py", line 2665, in completion
  File "litellm/llms/anthropic/chat/handler.py", line 461, in completion
litellm.llms.anthropic.common_utils.AnthropicError: litellm.Timeout: Connection timed out after 30.0 seconds.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "strix/llm/memory_compressor.py", line 120, in _summarize_messages
  File "litellm/utils.py", line 1739, in wrapper
  File "litellm/utils.py", line 1560, in wrapper
  File "litellm/main.py", line 4205, in completion
  File "litellm/litellm_core_utils/exception_mapping_utils.py", line 2356, in exception_type
  File "litellm/litellm_core_utils/exception_mapping_utils.py", line 672, in exception_type
litellm.exceptions.Timeout: litellm.Timeout: AnthropicException - litellm.Timeout: Connection timed out after 30.0 seconds.
```

In my run the same traceback repeated for 1h 27m before the process was manually killed. There is no observable retry-count limit and no exponential backoff between retries.

The Anthropic API itself was reachable from the host throughout — a separate `curl https://api.anthropic.com/v1/messages` handshake completed in 0.19s. The timeout is on the specific large-payload request from inside the agent, not network-level connectivity.

## Diagnosis

The default litellm request timeout is 30 seconds. Anthropic's response time on a context-summarisation request scales with input-token count; once the agent's accumulated conversation history is large enough, the summarisation request reliably takes longer than 30 seconds. `_summarize_messages` catches the resulting `litellm.exceptions.Timeout` and retries — but the retry uses the same 30-second timeout against the same large input, so it fails again, indefinitely.

The fix has three parts:

1. **Raise the default timeout for `memory_compressor` calls.** 180–300 seconds is more appropriate than 30 for a request whose whole purpose is to chew through a long context. This can be done by passing `timeout=300` (or whatever the agent considers the maximum acceptable wait) to the litellm `completion()` call inside `_summarize_messages`.
2. **Add exponential backoff between retries** so a transient Anthropic slowdown can recover. Even with a higher timeout, a single large summarisation call can occasionally exceed the limit; with backoff, the next retry has a chance of landing.
3. **Add a retry budget.** After (say) 5 consecutive timeouts, raise a terminal error and exit non-zero, so the wrapper process / scheduler can surface a clean failure instead of hanging forever.

## Suggested patch

The simplest patch is in `strix/llm/memory_compressor.py` around line 120 (`_summarize_messages`):

```python
# Pseudocode; exact API depends on litellm version
import time

MEMORY_COMPRESSOR_TIMEOUT_SEC = int(os.environ.get("STRIX_MEMORY_COMPRESSOR_TIMEOUT", 300))
MAX_ATTEMPTS = 5
BASE_BACKOFF_SEC = 5

def _summarize_messages(self, messages):
    for attempt in range(MAX_ATTEMPTS):
        try:
            return litellm.completion(
                model=self.model,
                messages=messages,
                timeout=MEMORY_COMPRESSOR_TIMEOUT_SEC,
                # ...other args
            )
        except litellm.exceptions.Timeout:
            if attempt == MAX_ATTEMPTS - 1:
                raise
            sleep_for = BASE_BACKOFF_SEC * (2 ** attempt)
            logger.warning(
                "memory_compressor litellm timeout (attempt %d/%d); "
                "backing off %ds before retry",
                attempt + 1, MAX_ATTEMPTS, sleep_for,
            )
            time.sleep(sleep_for)
```

A simpler interim fix that doesn't change the retry logic: **read `LITELLM_REQUEST_TIMEOUT` from the environment** and apply it everywhere litellm is called inside Strix. litellm already supports this env var globally, so a one-line change at the top of `memory_compressor.py` (or wherever the litellm client is constructed) to honour `os.environ.get("LITELLM_REQUEST_TIMEOUT", 30)` would let users raise the timeout without rebuilding Strix.

## Workaround until a fix lands

1. Always launch Strix with `LITELLM_REQUEST_TIMEOUT=300` in the environment.
2. Run a periodic health-check that watches the scan log for staleness; on ≥30 min stale + this exact traceback signature, surface to the user and ask whether to kill cleanly. Findings already written to `vulnerabilities/vuln-*.md` are usable independently of clean termination.
3. Never auto-kill — always confirm with the user first, since the agent could in principle recover during a long Anthropic slow-down.

This works around the symptom but doesn't fix the underlying retry loop, which is why I'm filing this issue.

## Environment

- Strix: `0.8.2` (have not yet retested on `0.8.3`; happy to do so if a fix is already in flight)
- Sandbox image: `ghcr.io/usestrix/strix-sandbox:0.1.12`
- LLM: `anthropic/claude-sonnet-4-6` via `litellm`
- Host: macOS 25.3.0, Apple Silicon, 16 GB RAM
- Docker Desktop running, ~7.65 GiB allocated to the sandbox
- Reproduced on a 4-target standard-mode multi-target run, ~3h wall-clock to onset of freeze, roughly 10 findings produced before freeze

## Severity (in my view)

Medium-to-high for any workflow that runs Strix on long multi-target standard-mode scans without manual babysitting. The findings already on disk are usable, so this isn't a data-loss bug — but it is a workflow killer for autonomous runs, and it consumes Anthropic tokens during the retry loop without producing value.

Happy to share more of the scan log on request, run repro on `0.8.3` if a fix has already shipped there, or test a candidate patch.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

memory_compressor enters infinite litellm-timeout retry loop on long-context runs #470

Summary

Reproduction

Diagnosis

Suggested patch

Workaround until a fix lands

Environment

Severity (in my view)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

memory_compressor enters infinite litellm-timeout retry loop on long-context runs #470

Description

Summary

Reproduction

Diagnosis

Suggested patch

Workaround until a fix lands

Environment

Severity (in my view)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions