title: "[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty, agent reports 'No such file or directory')"
labels: bug
Describe the bug
When --target points at a local directory, Strix tars the entire directory into an in-memory BytesIO and ships it to the sandbox container via container.put_archive("/workspace", ...). The whole operation is wrapped in except (OSError, DockerException): pass, so any failure — OOM on the in-memory tar, Docker API body-size limit, permission error on a single file, etc. — is swallowed silently. The container starts, /workspace/ stays empty, and the agent proceeds in a degraded "black-box" mode reporting things like:
$ ls -la /workspace/<target>/
ls: cannot access '/workspace/<target>/': No such file or directory
$ ls -la /workspace/
total 0
drwxr-xr-x 1 pentester pentester 0 <date> .
drwxr-xr-x 1 root root 32 <date> ..
The user is given no indication that the codebase was never copied in. The session ends with Vulnerabilities 0 (No exploitable vulnerabilities detected) despite no analysis having occurred.
Code path: strix/runtime/docker_runtime.py::_copy_local_directory_to_container (still present on main as of this report):
def _copy_local_directory_to_container(
self, container: Container, local_path: str, target_name: str | None = None
) -> None:
try:
local_path_obj = Path(local_path).resolve()
if not local_path_obj.exists() or not local_path_obj.is_dir():
return
tar_buffer = BytesIO()
with tarfile.open(fileobj=tar_buffer, mode="w") as tar:
for item in local_path_obj.rglob("*"):
if item.is_file():
rel_path = item.relative_to(local_path_obj)
arcname = Path(target_name) / rel_path if target_name else rel_path
tar.add(item, arcname=arcname)
tar_buffer.seek(0)
container.put_archive("/workspace", tar_buffer.getvalue())
container.exec_run(
"chown -R pentester:pentester /workspace && chmod -R 755 /workspace",
user="root",
)
except (OSError, DockerException):
pass
Three problems here:
- Unbounded in-memory tar. A real project repo can easily be multi-GB once
build/, node_modules/, .venv/, DerivedData/, .git/, IDE caches, coverage artifacts, etc. are walked by rglob("*"). The full tar is materialized in a BytesIO and then passed to put_archive as bytes — so the payload exists twice in RAM at the moment of upload.
- No exclude/ignore mechanism. Build outputs, VCS metadata, virtualenvs, and dependency caches are all happily walked and tarred. None of them are relevant to a security scan, but they dominate the size.
- Silent failure. The bare
except (OSError, DockerException): pass means the user has no signal that anything went wrong. There's no log line, no warning, no exit code change — just an empty /workspace/.
To Reproduce
- Point
--target at a real-world project directory of ~5GB or more (in my case: a multi-language repo with a build-output directory of ~5.1GB and assorted tool/cache directories of ~400MB; total ~6.2GB).
du -sh /path/to/repo
# 6.2G
strix --target /path/to/repo
- Wait for the sandbox to come up.
- Observe in the agent transcript that
ls /workspace/ shows only . and ... No subdirectory for the target was created.
- Verify no error was surfaced (and rule out a bind-mount issue — Strix uses
put_archive, not mounts):
docker inspect strix-scan-<id> --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{"\n"}}{{end}}'
# (empty)
- Workaround that confirms the cause: stage a slim copy without build artifacts and re-run:
rsync -a \
--exclude='build/' --exclude='.git/' --exclude='.venv/' \
--exclude='node_modules/' --exclude='__pycache__/' \
--exclude='.mypy_cache/' \
/path/to/repo/ /tmp/repo-slim/
du -sh /tmp/repo-slim # ~75MB in my case
strix --target /tmp/repo-slim
With the slimmed copy, /workspace/repo-slim/ is populated correctly and the scan proceeds normally.
Expected behavior
- If the copy fails for any reason, Strix should surface a clear error to the user (e.g. "Failed to copy
/path/to/repo into sandbox: tar archive exceeded 4GB / OOMed / docker rejected put_archive") and either retry with sensible excludes or exit non-zero. The current silent fallback to an empty workspace is the worst outcome — the agent runs to completion, charges tokens, and reports no findings, leaving the user to assume the codebase is clean.
- For directories of realistic project size, Strix should not require the user to pre-slim the input.
Suggested fixes (each independently useful)
- Don't swallow the exception. At minimum,
logger.exception(...) it and re-raise (or fail the scan with a clear message). Treat "couldn't get the code into the sandbox" as fatal in white-box mode.
- Stream the tar instead of buffering. Write to a
tempfile.NamedTemporaryFile (or a pipe) and pass that to put_archive. Avoids the double-RAM peak and lets you handle multi-GB inputs.
- Honor
.gitignore and a built-in default exclude list. A sensible default like {.git, .venv, venv, env, node_modules, build, dist, target, __pycache__, .mypy_cache, .pytest_cache, .ruff_cache, htmlcov, coverage.xml, .DS_Store, *.pyc} would catch 99% of cases. Optionally honor a .strixignore for project-specific tuning.
- Pre-flight size check + warning. Before tarring, walk the tree once and warn if the candidate set exceeds, say, 1GB; offer to apply the default excludes.
- Consider bind-mounting on Linux/macOS Docker setups as an alternative path that sidesteps both the size and the
put_archive API limit. (put_archive is documented to be capable but in practice has scaling issues with large payloads.)
Screenshots
N/A — failure mode is the absence of output, which is the point.
System Information:
- OS: macOS 26.3.1 (Apple Silicon), OrbStack as the Docker runtime
- Strix Version: 0.8.3 (sandbox image
ghcr.io/usestrix/strix-sandbox:0.1.13); also verified the same code path on main at the time of filing
- Python Version: 3.14.5 (installed via
pipx)
- LLM Used: OpenAI (default)
Additional context
title: "[BUG] Local-code targets silently fail to copy when target dir is large (
/workspace/ends up empty, agent reports 'No such file or directory')"labels: bug
Describe the bug
When
--targetpoints at a local directory, Strix tars the entire directory into an in-memoryBytesIOand ships it to the sandbox container viacontainer.put_archive("/workspace", ...). The whole operation is wrapped inexcept (OSError, DockerException): pass, so any failure — OOM on the in-memory tar, Docker API body-size limit, permission error on a single file, etc. — is swallowed silently. The container starts,/workspace/stays empty, and the agent proceeds in a degraded "black-box" mode reporting things like:The user is given no indication that the codebase was never copied in. The session ends with
Vulnerabilities 0 (No exploitable vulnerabilities detected)despite no analysis having occurred.Code path:
strix/runtime/docker_runtime.py::_copy_local_directory_to_container(still present onmainas of this report):Three problems here:
build/,node_modules/,.venv/,DerivedData/,.git/, IDE caches, coverage artifacts, etc. are walked byrglob("*"). The full tar is materialized in aBytesIOand then passed toput_archiveasbytes— so the payload exists twice in RAM at the moment of upload.except (OSError, DockerException): passmeans the user has no signal that anything went wrong. There's no log line, no warning, no exit code change — just an empty/workspace/.To Reproduce
--targetat a real-world project directory of ~5GB or more (in my case: a multi-language repo with a build-output directory of ~5.1GB and assorted tool/cache directories of ~400MB; total ~6.2GB).du -sh /path/to/repo # 6.2G strix --target /path/to/repols /workspace/shows only.and... No subdirectory for the target was created.put_archive, not mounts):/workspace/repo-slim/is populated correctly and the scan proceeds normally.Expected behavior
/path/to/repointo sandbox: tar archive exceeded 4GB / OOMed / docker rejected put_archive") and either retry with sensible excludes or exit non-zero. The current silent fallback to an empty workspace is the worst outcome — the agent runs to completion, charges tokens, and reports no findings, leaving the user to assume the codebase is clean.Suggested fixes (each independently useful)
logger.exception(...)it and re-raise (or fail the scan with a clear message). Treat "couldn't get the code into the sandbox" as fatal in white-box mode.tempfile.NamedTemporaryFile(or a pipe) and pass that toput_archive. Avoids the double-RAM peak and lets you handle multi-GB inputs..gitignoreand a built-in default exclude list. A sensible default like{.git, .venv, venv, env, node_modules, build, dist, target, __pycache__, .mypy_cache, .pytest_cache, .ruff_cache, htmlcov, coverage.xml, .DS_Store, *.pyc}would catch 99% of cases. Optionally honor a.strixignorefor project-specific tuning.put_archiveAPI limit. (put_archiveis documented to be capable but in practice has scaling issues with large payloads.)Screenshots
N/A — failure mode is the absence of output, which is the point.
System Information:
ghcr.io/usestrix/strix-sandbox:0.1.13); also verified the same code path onmainat the time of filingpipx)Additional context
put_archive,_copy_local_directory,BytesIO,rglob, "workspace empty", "local code" — no prior reports.