Skip to content

[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty) #492

@frankea

Description

@frankea

title: "[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty, agent reports 'No such file or directory')"
labels: bug

Describe the bug

When --target points at a local directory, Strix tars the entire directory into an in-memory BytesIO and ships it to the sandbox container via container.put_archive("/workspace", ...). The whole operation is wrapped in except (OSError, DockerException): pass, so any failure — OOM on the in-memory tar, Docker API body-size limit, permission error on a single file, etc. — is swallowed silently. The container starts, /workspace/ stays empty, and the agent proceeds in a degraded "black-box" mode reporting things like:

$ ls -la /workspace/<target>/
ls: cannot access '/workspace/<target>/': No such file or directory

$ ls -la /workspace/
total 0
drwxr-xr-x 1 pentester pentester  0 <date> .
drwxr-xr-x 1 root      root      32 <date> ..

The user is given no indication that the codebase was never copied in. The session ends with Vulnerabilities 0 (No exploitable vulnerabilities detected) despite no analysis having occurred.

Code path: strix/runtime/docker_runtime.py::_copy_local_directory_to_container (still present on main as of this report):

def _copy_local_directory_to_container(
    self, container: Container, local_path: str, target_name: str | None = None
) -> None:
    try:
        local_path_obj = Path(local_path).resolve()
        if not local_path_obj.exists() or not local_path_obj.is_dir():
            return

        tar_buffer = BytesIO()
        with tarfile.open(fileobj=tar_buffer, mode="w") as tar:
            for item in local_path_obj.rglob("*"):
                if item.is_file():
                    rel_path = item.relative_to(local_path_obj)
                    arcname = Path(target_name) / rel_path if target_name else rel_path
                    tar.add(item, arcname=arcname)

        tar_buffer.seek(0)
        container.put_archive("/workspace", tar_buffer.getvalue())
        container.exec_run(
            "chown -R pentester:pentester /workspace && chmod -R 755 /workspace",
            user="root",
        )
    except (OSError, DockerException):
        pass

Three problems here:

  1. Unbounded in-memory tar. A real project repo can easily be multi-GB once build/, node_modules/, .venv/, DerivedData/, .git/, IDE caches, coverage artifacts, etc. are walked by rglob("*"). The full tar is materialized in a BytesIO and then passed to put_archive as bytes — so the payload exists twice in RAM at the moment of upload.
  2. No exclude/ignore mechanism. Build outputs, VCS metadata, virtualenvs, and dependency caches are all happily walked and tarred. None of them are relevant to a security scan, but they dominate the size.
  3. Silent failure. The bare except (OSError, DockerException): pass means the user has no signal that anything went wrong. There's no log line, no warning, no exit code change — just an empty /workspace/.

To Reproduce

  1. Point --target at a real-world project directory of ~5GB or more (in my case: a multi-language repo with a build-output directory of ~5.1GB and assorted tool/cache directories of ~400MB; total ~6.2GB).
    du -sh /path/to/repo
    # 6.2G
    strix --target /path/to/repo
  2. Wait for the sandbox to come up.
  3. Observe in the agent transcript that ls /workspace/ shows only . and ... No subdirectory for the target was created.
  4. Verify no error was surfaced (and rule out a bind-mount issue — Strix uses put_archive, not mounts):
    docker inspect strix-scan-<id> --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{"\n"}}{{end}}'
    # (empty)
  5. Workaround that confirms the cause: stage a slim copy without build artifacts and re-run:
    rsync -a \
      --exclude='build/' --exclude='.git/' --exclude='.venv/' \
      --exclude='node_modules/' --exclude='__pycache__/' \
      --exclude='.mypy_cache/' \
      /path/to/repo/ /tmp/repo-slim/
    du -sh /tmp/repo-slim   # ~75MB in my case
    strix --target /tmp/repo-slim
    With the slimmed copy, /workspace/repo-slim/ is populated correctly and the scan proceeds normally.

Expected behavior

  • If the copy fails for any reason, Strix should surface a clear error to the user (e.g. "Failed to copy /path/to/repo into sandbox: tar archive exceeded 4GB / OOMed / docker rejected put_archive") and either retry with sensible excludes or exit non-zero. The current silent fallback to an empty workspace is the worst outcome — the agent runs to completion, charges tokens, and reports no findings, leaving the user to assume the codebase is clean.
  • For directories of realistic project size, Strix should not require the user to pre-slim the input.

Suggested fixes (each independently useful)

  1. Don't swallow the exception. At minimum, logger.exception(...) it and re-raise (or fail the scan with a clear message). Treat "couldn't get the code into the sandbox" as fatal in white-box mode.
  2. Stream the tar instead of buffering. Write to a tempfile.NamedTemporaryFile (or a pipe) and pass that to put_archive. Avoids the double-RAM peak and lets you handle multi-GB inputs.
  3. Honor .gitignore and a built-in default exclude list. A sensible default like {.git, .venv, venv, env, node_modules, build, dist, target, __pycache__, .mypy_cache, .pytest_cache, .ruff_cache, htmlcov, coverage.xml, .DS_Store, *.pyc} would catch 99% of cases. Optionally honor a .strixignore for project-specific tuning.
  4. Pre-flight size check + warning. Before tarring, walk the tree once and warn if the candidate set exceeds, say, 1GB; offer to apply the default excludes.
  5. Consider bind-mounting on Linux/macOS Docker setups as an alternative path that sidesteps both the size and the put_archive API limit. (put_archive is documented to be capable but in practice has scaling issues with large payloads.)

Screenshots

N/A — failure mode is the absence of output, which is the point.

System Information:

  • OS: macOS 26.3.1 (Apple Silicon), OrbStack as the Docker runtime
  • Strix Version: 0.8.3 (sandbox image ghcr.io/usestrix/strix-sandbox:0.1.13); also verified the same code path on main at the time of filing
  • Python Version: 3.14.5 (installed via pipx)
  • LLM Used: OpenAI (default)

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions