Summary
On long-running multi-target Strix scans (≈3+ hours wall-clock, with the agent producing roughly 10 findings before the freeze occurs), strix.llm.memory_compressor._summarize_messages() calls into litellm with the default 30-second request timeout. Once the agent's accumulated context exceeds roughly 150 KB, the summarisation request to Anthropic regularly takes longer than 30 seconds. litellm raises litellm.exceptions.Timeout, the agent retries the same call, and the cycle repeats indefinitely with no backoff and no terminal failure. The Strix process and Docker sandbox stay alive, CPU drops to ~1%, no new findings are written, and the scan never completes.
The agent had produced roughly 10 findings before the freeze and was healthy until that point — this is a fail-stuck mode rather than an in-flight crash. The only externally-visible symptom is "log file stops growing for 30+ minutes."
Reproduction
- Strix
0.8.2, Docker sandbox image ghcr.io/usestrix/strix-sandbox:0.1.12, host macOS 25.3.0 (Apple Silicon).
~/.strix/cli-config.json configured for Anthropic:
{
"env": {
"STRIX_LLM": "anthropic/claude-sonnet-4-6",
"LLM_API_KEY": "<anthropic-key>"
}
}
- Launch a multi-target standard-mode scan against four mid-size public web applications:
strix \
--target https://target1.example \
--target https://target2.example \
--target https://target3.example \
--target https://target4.example \
--scan-mode standard \
--instruction-file instructions.txt \
-n
- Let the scan run for ~3 hours. After the agent has produced roughly 10 findings (during the third hour), the log stops growing.
Expected: the scan continues to produce findings until completion, or fails terminally with a clear error.
Actual: the log file freezes at the end of the most recent finding's writeup. The Strix process stays alive (CPU ≈ 1%, memory steady at ~2 GB / 8 GB available). The Docker sandbox container stays "Up". Re-checking the log every few minutes shows the same Python traceback being repeated:
Traceback (most recent call last):
File "httpx/_client.py", line 1014, in _send_single_request
File "httpx/_transports/default.py", line 249, in handle_request
File "contextlib.py", line 158, in __exit__
File "httpx/_transports/default.py", line 118, in map_httpcore_exceptions
httpx.ReadTimeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "litellm/llms/anthropic/chat/handler.py", line 446, in completion
File "litellm/llms/custom_httpx/http_handler.py", line 964, in post
litellm.exceptions.Timeout: litellm.Timeout: Connection timed out after 30.0 seconds.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "litellm/main.py", line 2665, in completion
File "litellm/llms/anthropic/chat/handler.py", line 461, in completion
litellm.llms.anthropic.common_utils.AnthropicError: litellm.Timeout: Connection timed out after 30.0 seconds.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "strix/llm/memory_compressor.py", line 120, in _summarize_messages
File "litellm/utils.py", line 1739, in wrapper
File "litellm/utils.py", line 1560, in wrapper
File "litellm/main.py", line 4205, in completion
File "litellm/litellm_core_utils/exception_mapping_utils.py", line 2356, in exception_type
File "litellm/litellm_core_utils/exception_mapping_utils.py", line 672, in exception_type
litellm.exceptions.Timeout: litellm.Timeout: AnthropicException - litellm.Timeout: Connection timed out after 30.0 seconds.
In my run the same traceback repeated for 1h 27m before the process was manually killed. There is no observable retry-count limit and no exponential backoff between retries.
The Anthropic API itself was reachable from the host throughout — a separate curl https://api.anthropic.com/v1/messages handshake completed in 0.19s. The timeout is on the specific large-payload request from inside the agent, not network-level connectivity.
Diagnosis
The default litellm request timeout is 30 seconds. Anthropic's response time on a context-summarisation request scales with input-token count; once the agent's accumulated conversation history is large enough, the summarisation request reliably takes longer than 30 seconds. _summarize_messages catches the resulting litellm.exceptions.Timeout and retries — but the retry uses the same 30-second timeout against the same large input, so it fails again, indefinitely.
The fix has three parts:
- Raise the default timeout for
memory_compressor calls. 180–300 seconds is more appropriate than 30 for a request whose whole purpose is to chew through a long context. This can be done by passing timeout=300 (or whatever the agent considers the maximum acceptable wait) to the litellm completion() call inside _summarize_messages.
- Add exponential backoff between retries so a transient Anthropic slowdown can recover. Even with a higher timeout, a single large summarisation call can occasionally exceed the limit; with backoff, the next retry has a chance of landing.
- Add a retry budget. After (say) 5 consecutive timeouts, raise a terminal error and exit non-zero, so the wrapper process / scheduler can surface a clean failure instead of hanging forever.
Suggested patch
The simplest patch is in strix/llm/memory_compressor.py around line 120 (_summarize_messages):
# Pseudocode; exact API depends on litellm version
import time
MEMORY_COMPRESSOR_TIMEOUT_SEC = int(os.environ.get("STRIX_MEMORY_COMPRESSOR_TIMEOUT", 300))
MAX_ATTEMPTS = 5
BASE_BACKOFF_SEC = 5
def _summarize_messages(self, messages):
for attempt in range(MAX_ATTEMPTS):
try:
return litellm.completion(
model=self.model,
messages=messages,
timeout=MEMORY_COMPRESSOR_TIMEOUT_SEC,
# ...other args
)
except litellm.exceptions.Timeout:
if attempt == MAX_ATTEMPTS - 1:
raise
sleep_for = BASE_BACKOFF_SEC * (2 ** attempt)
logger.warning(
"memory_compressor litellm timeout (attempt %d/%d); "
"backing off %ds before retry",
attempt + 1, MAX_ATTEMPTS, sleep_for,
)
time.sleep(sleep_for)
A simpler interim fix that doesn't change the retry logic: read LITELLM_REQUEST_TIMEOUT from the environment and apply it everywhere litellm is called inside Strix. litellm already supports this env var globally, so a one-line change at the top of memory_compressor.py (or wherever the litellm client is constructed) to honour os.environ.get("LITELLM_REQUEST_TIMEOUT", 30) would let users raise the timeout without rebuilding Strix.
Workaround until a fix lands
- Always launch Strix with
LITELLM_REQUEST_TIMEOUT=300 in the environment.
- Run a periodic health-check that watches the scan log for staleness; on ≥30 min stale + this exact traceback signature, surface to the user and ask whether to kill cleanly. Findings already written to
vulnerabilities/vuln-*.md are usable independently of clean termination.
- Never auto-kill — always confirm with the user first, since the agent could in principle recover during a long Anthropic slow-down.
This works around the symptom but doesn't fix the underlying retry loop, which is why I'm filing this issue.
Environment
- Strix:
0.8.2 (have not yet retested on 0.8.3; happy to do so if a fix is already in flight)
- Sandbox image:
ghcr.io/usestrix/strix-sandbox:0.1.12
- LLM:
anthropic/claude-sonnet-4-6 via litellm
- Host: macOS 25.3.0, Apple Silicon, 16 GB RAM
- Docker Desktop running, ~7.65 GiB allocated to the sandbox
- Reproduced on a 4-target standard-mode multi-target run, ~3h wall-clock to onset of freeze, roughly 10 findings produced before freeze
Severity (in my view)
Medium-to-high for any workflow that runs Strix on long multi-target standard-mode scans without manual babysitting. The findings already on disk are usable, so this isn't a data-loss bug — but it is a workflow killer for autonomous runs, and it consumes Anthropic tokens during the retry loop without producing value.
Happy to share more of the scan log on request, run repro on 0.8.3 if a fix has already shipped there, or test a candidate patch.
Summary
On long-running multi-target Strix scans (≈3+ hours wall-clock, with the agent producing roughly 10 findings before the freeze occurs),
strix.llm.memory_compressor._summarize_messages()calls into litellm with the default 30-second request timeout. Once the agent's accumulated context exceeds roughly 150 KB, the summarisation request to Anthropic regularly takes longer than 30 seconds. litellm raiseslitellm.exceptions.Timeout, the agent retries the same call, and the cycle repeats indefinitely with no backoff and no terminal failure. The Strix process and Docker sandbox stay alive, CPU drops to ~1%, no new findings are written, and the scan never completes.The agent had produced roughly 10 findings before the freeze and was healthy until that point — this is a fail-stuck mode rather than an in-flight crash. The only externally-visible symptom is "log file stops growing for 30+ minutes."
Reproduction
0.8.2, Docker sandbox imageghcr.io/usestrix/strix-sandbox:0.1.12, host macOS 25.3.0 (Apple Silicon).~/.strix/cli-config.jsonconfigured for Anthropic:{ "env": { "STRIX_LLM": "anthropic/claude-sonnet-4-6", "LLM_API_KEY": "<anthropic-key>" } }Expected: the scan continues to produce findings until completion, or fails terminally with a clear error.
Actual: the log file freezes at the end of the most recent finding's writeup. The Strix process stays alive (CPU ≈ 1%, memory steady at ~2 GB / 8 GB available). The Docker sandbox container stays "Up". Re-checking the log every few minutes shows the same Python traceback being repeated:
In my run the same traceback repeated for 1h 27m before the process was manually killed. There is no observable retry-count limit and no exponential backoff between retries.
The Anthropic API itself was reachable from the host throughout — a separate
curl https://api.anthropic.com/v1/messageshandshake completed in 0.19s. The timeout is on the specific large-payload request from inside the agent, not network-level connectivity.Diagnosis
The default litellm request timeout is 30 seconds. Anthropic's response time on a context-summarisation request scales with input-token count; once the agent's accumulated conversation history is large enough, the summarisation request reliably takes longer than 30 seconds.
_summarize_messagescatches the resultinglitellm.exceptions.Timeoutand retries — but the retry uses the same 30-second timeout against the same large input, so it fails again, indefinitely.The fix has three parts:
memory_compressorcalls. 180–300 seconds is more appropriate than 30 for a request whose whole purpose is to chew through a long context. This can be done by passingtimeout=300(or whatever the agent considers the maximum acceptable wait) to the litellmcompletion()call inside_summarize_messages.Suggested patch
The simplest patch is in
strix/llm/memory_compressor.pyaround line 120 (_summarize_messages):A simpler interim fix that doesn't change the retry logic: read
LITELLM_REQUEST_TIMEOUTfrom the environment and apply it everywhere litellm is called inside Strix. litellm already supports this env var globally, so a one-line change at the top ofmemory_compressor.py(or wherever the litellm client is constructed) to honouros.environ.get("LITELLM_REQUEST_TIMEOUT", 30)would let users raise the timeout without rebuilding Strix.Workaround until a fix lands
LITELLM_REQUEST_TIMEOUT=300in the environment.vulnerabilities/vuln-*.mdare usable independently of clean termination.This works around the symptom but doesn't fix the underlying retry loop, which is why I'm filing this issue.
Environment
0.8.2(have not yet retested on0.8.3; happy to do so if a fix is already in flight)ghcr.io/usestrix/strix-sandbox:0.1.12anthropic/claude-sonnet-4-6vialitellmSeverity (in my view)
Medium-to-high for any workflow that runs Strix on long multi-target standard-mode scans without manual babysitting. The findings already on disk are usable, so this isn't a data-loss bug — but it is a workflow killer for autonomous runs, and it consumes Anthropic tokens during the retry loop without producing value.
Happy to share more of the scan log on request, run repro on
0.8.3if a fix has already shipped there, or test a candidate patch.