[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty)

---
title: "[BUG] Local-code targets silently fail to copy when target dir is large (`/workspace/` ends up empty, agent reports 'No such file or directory')"
labels: bug
---

**Describe the bug**

When `--target` points at a local directory, Strix tars the entire directory into an in-memory `BytesIO` and ships it to the sandbox container via `container.put_archive("/workspace", ...)`. The whole operation is wrapped in `except (OSError, DockerException): pass`, so any failure — OOM on the in-memory tar, Docker API body-size limit, permission error on a single file, etc. — is **swallowed silently**. The container starts, `/workspace/` stays empty, and the agent proceeds in a degraded "black-box" mode reporting things like:

```
$ ls -la /workspace/<target>/
ls: cannot access '/workspace/<target>/': No such file or directory

$ ls -la /workspace/
total 0
drwxr-xr-x 1 pentester pentester  0 <date> .
drwxr-xr-x 1 root      root      32 <date> ..
```

The user is given no indication that the codebase was never copied in. The session ends with `Vulnerabilities 0 (No exploitable vulnerabilities detected)` despite no analysis having occurred.

Code path: `strix/runtime/docker_runtime.py::_copy_local_directory_to_container` (still present on `main` as of this report):

```python
def _copy_local_directory_to_container(
    self, container: Container, local_path: str, target_name: str | None = None
) -> None:
    try:
        local_path_obj = Path(local_path).resolve()
        if not local_path_obj.exists() or not local_path_obj.is_dir():
            return

        tar_buffer = BytesIO()
        with tarfile.open(fileobj=tar_buffer, mode="w") as tar:
            for item in local_path_obj.rglob("*"):
                if item.is_file():
                    rel_path = item.relative_to(local_path_obj)
                    arcname = Path(target_name) / rel_path if target_name else rel_path
                    tar.add(item, arcname=arcname)

        tar_buffer.seek(0)
        container.put_archive("/workspace", tar_buffer.getvalue())
        container.exec_run(
            "chown -R pentester:pentester /workspace && chmod -R 755 /workspace",
            user="root",
        )
    except (OSError, DockerException):
        pass
```

Three problems here:

1. **Unbounded in-memory tar.** A real project repo can easily be multi-GB once `build/`, `node_modules/`, `.venv/`, `DerivedData/`, `.git/`, IDE caches, coverage artifacts, etc. are walked by `rglob("*")`. The full tar is materialized in a `BytesIO` and then passed to `put_archive` as `bytes` — so the payload exists twice in RAM at the moment of upload.
2. **No exclude/ignore mechanism.** Build outputs, VCS metadata, virtualenvs, and dependency caches are all happily walked and tarred. None of them are relevant to a security scan, but they dominate the size.
3. **Silent failure.** The bare `except (OSError, DockerException): pass` means the user has no signal that anything went wrong. There's no log line, no warning, no exit code change — just an empty `/workspace/`.

**To Reproduce**

1. Point `--target` at a real-world project directory of ~5GB or more (in my case: a multi-language repo with a build-output directory of ~5.1GB and assorted tool/cache directories of ~400MB; total ~6.2GB).
   ```bash
   du -sh /path/to/repo
   # 6.2G
   strix --target /path/to/repo
   ```
2. Wait for the sandbox to come up.
3. Observe in the agent transcript that `ls /workspace/` shows only `.` and `..`. No subdirectory for the target was created.
4. Verify no error was surfaced (and rule out a bind-mount issue — Strix uses `put_archive`, not mounts):
   ```bash
   docker inspect strix-scan-<id> --format '{{range .Mounts}}{{.Source}} -> {{.Destination}}{{"\n"}}{{end}}'
   # (empty)
   ```
5. **Workaround that confirms the cause:** stage a slim copy without build artifacts and re-run:
   ```bash
   rsync -a \
     --exclude='build/' --exclude='.git/' --exclude='.venv/' \
     --exclude='node_modules/' --exclude='__pycache__/' \
     --exclude='.mypy_cache/' \
     /path/to/repo/ /tmp/repo-slim/
   du -sh /tmp/repo-slim   # ~75MB in my case
   strix --target /tmp/repo-slim
   ```
   With the slimmed copy, `/workspace/repo-slim/` is populated correctly and the scan proceeds normally.

**Expected behavior**

- If the copy fails for any reason, Strix should surface a clear error to the user (e.g. "Failed to copy `/path/to/repo` into sandbox: tar archive exceeded 4GB / OOMed / docker rejected put_archive") and either retry with sensible excludes or exit non-zero. The current silent fallback to an empty workspace is the worst outcome — the agent runs to completion, charges tokens, and reports no findings, leaving the user to assume the codebase is clean.
- For directories of realistic project size, Strix should not require the user to pre-slim the input.

**Suggested fixes** (each independently useful)

1. **Don't swallow the exception.** At minimum, `logger.exception(...)` it and re-raise (or fail the scan with a clear message). Treat "couldn't get the code into the sandbox" as fatal in white-box mode.
2. **Stream the tar instead of buffering.** Write to a `tempfile.NamedTemporaryFile` (or a pipe) and pass that to `put_archive`. Avoids the double-RAM peak and lets you handle multi-GB inputs.
3. **Honor `.gitignore` and a built-in default exclude list.** A sensible default like `{.git, .venv, venv, env, node_modules, build, dist, target, __pycache__, .mypy_cache, .pytest_cache, .ruff_cache, htmlcov, coverage.xml, .DS_Store, *.pyc}` would catch 99% of cases. Optionally honor a `.strixignore` for project-specific tuning.
4. **Pre-flight size check + warning.** Before tarring, walk the tree once and warn if the candidate set exceeds, say, 1GB; offer to apply the default excludes.
5. **Consider bind-mounting on Linux/macOS Docker setups** as an alternative path that sidesteps both the size and the `put_archive` API limit. (`put_archive` is documented to be capable but in practice has scaling issues with large payloads.)

**Screenshots**

N/A — failure mode is the *absence* of output, which is the point.

**System Information:**
- OS: macOS 26.3.1 (Apple Silicon), OrbStack as the Docker runtime
- Strix Version: 0.8.3 (sandbox image `ghcr.io/usestrix/strix-sandbox:0.1.13`); also verified the same code path on `main` at the time of filing
- Python Version: 3.14.5 (installed via `pipx`)
- LLM Used: OpenAI (default)

**Additional context**

- Searched existing issues/PRs for `put_archive`, `_copy_local_directory`, `BytesIO`, `rglob`, "workspace empty", "local code" — no prior reports.
- Issue #287 ("Expand source-aware testing support") is adjacent but about adding SAST tooling, not about the copy mechanism.
- Happy to send a PR with any combination of the fixes above (preference: #1 + #3 as a minimum, #2 as a follow-up). Want to confirm the maintainers' preferred direction before opening one.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty) #492

title: "[BUG] Local-code targets silently fail to copy when target dir is large (`/workspace/` ends up empty, agent reports 'No such file or directory')"
labels: bug

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty) #492

Description

title: "[BUG] Local-code targets silently fail to copy when target dir is large (/workspace/ ends up empty, agent reports 'No such file or directory')" labels: bug

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

title: "[BUG] Local-code targets silently fail to copy when target dir is large (`/workspace/` ends up empty, agent reports 'No such file or directory')"
labels: bug