Constrain server data file paths by fallintoplace · Pull Request #1423 · NVIDIA/cuopt

fallintoplace · 2026-06-10T21:42:30Z

Summary

The local-file request path for cuopt-data-file now enforces the documented CUOPT_DATA_DIR boundary.

Previously the server joined CUOPT_DATA_DIR with the header value and only checked existence. Absolute paths and .. segments could resolve outside the configured data directory, and symlinks could do the same.

This change rejects absolute input paths, resolves both the configured root and requested file with realpath, verifies the result stays under CUOPT_DATA_DIR, and requires the target to be a regular file.

Validation

python3 -m py_compile python/cuopt_server/cuopt_server/webserver.py python/cuopt_server/cuopt_server/tests/test_file_path_validation.py
git diff --check
uvx --with ruff ruff check python/cuopt_server/cuopt_server/webserver.py python/cuopt_server/cuopt_server/tests/test_file_path_validation.py
PYTHONPATH=python/cuopt_server uvx --with 'pytest<9.0' --with fastapi --with jsonref==1.1.0 --with msgpack==1.1.2 --with msgpack-numpy==0.4.8 --with psutil --with uvicorn --with numpy --with pandas pytest python/cuopt_server/cuopt_server/tests/test_file_path_validation.py -q

The pytest run passed with existing Pydantic deprecation warnings during server import.

copy-pr-bot · 2026-06-10T21:42:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-10T21:47:06Z

📝 Walkthrough

Walkthrough

This PR rewrites the validate_file_path function to enforce CUOPT data directory containment, rejecting absolute paths and directory-traversal/symlink-based escapes, with comprehensive test coverage verifying all validation scenarios and error conditions.

Changes

File Path Validation Security

Layer / File(s)	Summary
Path validation implementation `python/cuopt_server/cuopt_server/webserver.py`	`validate_file_path` now validates `CUOPT_DATA_DIR` is configured, rejects absolute input paths, resolves real paths and enforces containment within the data directory using `os.path.commonpath` to prevent directory traversal and symlink escapes, confirms the resolved path is an existing file, and raises HTTP 400 with specific, clear error messages.
Path validation test suite `python/cuopt_server/cuopt_server/tests/test_file_path_validation.py`	Autouse fixture saves and restores settings state; four test cases verify: valid relative paths within data directory return the absolute file path; absolute paths are rejected; parent-directory traversal escapes are rejected; symlinks resolving outside the data directory are rejected. All rejections assert HTTP 400 status and expected error message content.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: constraining server data file paths to enforce CUOPT_DATA_DIR boundaries, which is the core objective of the PR.
Description check	✅ Passed	The description is directly related to the changeset, explaining the security issue, the solution implemented, and validation steps performed.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

python/cuopt_server/cuopt_server/webserver.py (1)
1047-1052: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

The boundary check is still bypassable via TOCTOU.

Line 1051 validates the pathname, but Lines 1142-1154 enqueue only the string for later use in another worker. If CUOPT_DATA_DIR is writable by an untrusted process, the validated file can be replaced with a different file or symlink after the check and before the solver opens it, which reintroduces the escape this PR is trying to close. Open the file here and pass an fd/bytes, or re-resolve and validate again at the actual open site.

Also applies to: 1142-1154
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt_server/cuopt_server/webserver.py` around lines 1047 - 1052, The
code validates the pathname via validate_file_path(cuopt_data_file) and then
later enqueues only the string (file_path), leaving a TOCTOU window if
CUOPT_DATA_DIR is writable; fix by opening the file immediately after
validate_file_path (e.g., obtain a file descriptor or read bytes) and pass that
safe handle/data into the queue instead of the filename, or if you must enqueue
a path, re-resolve and re-run validate_file_path at the actual open site (the
worker that processes the queue, i.e., where you currently handle lines
~1142-1154) and ensure you use openat with the directory fd or verify inode/dev
after open to prevent symlink/race attacks.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuopt_server/cuopt_server/tests/test_file_path_validation.py`:
- Around line 22-74: Add tests in test_file_path_validation.py to cover the two
missing branches in webserver.validate_file_path: (1) the CUOPT_DATA_DIR-unset
path by clearing/unsetting settings.set_data_dir (or not calling it) and calling
validate_file_path to assert it raises the expected HTTPException/status and
message about CUOPT_DATA_DIR being unset; and (2) the "must be a regular file"
path by creating a non-regular target inside the data dir (e.g., a subdirectory
or FIFO) and calling validate_file_path on that name to assert it raises
HTTPException with the "regular file" (or similar) detail. Use the existing test
helpers and exception assertions (pytest.raises and exc_info.value.detail) and
reference validate_file_path and settings.set_data_dir so the new tests exercise
those branches.

In `@python/cuopt_server/cuopt_server/webserver.py`:
- Around line 216-220: The log call leaks the resolved internal filesystem path
via file_path; update the error logging in webserver.py so it does not include
file_path (use cuopt_data_file or a generic message instead) and ensure any
HTTPException detail or other logs never expose the resolved server path; locate
the check using file_path and cuopt_data_file and replace the
logging.error(f"File path '{file_path}'...") with a safe message that references
only cuopt_data_file or a non-path generic string.

---

Outside diff comments:
In `@python/cuopt_server/cuopt_server/webserver.py`:
- Around line 1047-1052: The code validates the pathname via
validate_file_path(cuopt_data_file) and then later enqueues only the string
(file_path), leaving a TOCTOU window if CUOPT_DATA_DIR is writable; fix by
opening the file immediately after validate_file_path (e.g., obtain a file
descriptor or read bytes) and pass that safe handle/data into the queue instead
of the filename, or if you must enqueue a path, re-resolve and re-run
validate_file_path at the actual open site (the worker that processes the queue,
i.e., where you currently handle lines ~1142-1154) and ensure you use openat
with the directory fd or verify inode/dev after open to prevent symlink/race
attacks.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: cadfe213-c21f-4a01-af0b-1bfba8893d48

📥 Commits

Reviewing files that changed from the base of the PR and between c99a5ce and 0c5676c.

📒 Files selected for processing (2)

python/cuopt_server/cuopt_server/tests/test_file_path_validation.py
python/cuopt_server/cuopt_server/webserver.py

coderabbitai · 2026-06-10T21:47:09Z

+def test_validate_file_path_returns_file_if_relative_path_stays_in_data_dir(
+    tmp_path,
+):
+    data_dir = tmp_path / "data"
+    data_dir.mkdir()
+    data_file = data_dir / "input.json"
+    data_file.write_text("{}", encoding="utf-8")
+    settings.set_data_dir(str(data_dir))
+
+    assert webserver.validate_file_path("input.json") == str(data_file)
+
+
+def test_validate_file_path_rejects_absolute_path(tmp_path):
+    data_dir = tmp_path / "data"
+    data_dir.mkdir()
+    outside_file = tmp_path / "input.json"
+    outside_file.write_text("{}", encoding="utf-8")
+    settings.set_data_dir(str(data_dir))
+
+    with pytest.raises(HTTPException) as exc_info:
+        webserver.validate_file_path(str(outside_file))
+
+    assert exc_info.value.status_code == 400
+    assert "relative to CUOPT_DATA_DIR" in exc_info.value.detail
+
+
+def test_validate_file_path_rejects_parent_directory_escape(tmp_path):
+    data_dir = tmp_path / "data"
+    data_dir.mkdir()
+    outside_file = tmp_path / "input.json"
+    outside_file.write_text("{}", encoding="utf-8")
+    settings.set_data_dir(str(data_dir))
+
+    with pytest.raises(HTTPException) as exc_info:
+        webserver.validate_file_path("../input.json")
+
+    assert exc_info.value.status_code == 400
+    assert "stay inside CUOPT_DATA_DIR" in exc_info.value.detail
+
+
+def test_validate_file_path_rejects_symlink_escape(tmp_path):
+    data_dir = tmp_path / "data"
+    data_dir.mkdir()
+    outside_file = tmp_path / "input.json"
+    outside_file.write_text("{}", encoding="utf-8")
+    os.symlink(outside_file, data_dir / "linked-input.json")
+    settings.set_data_dir(str(data_dir))
+
+    with pytest.raises(HTTPException) as exc_info:
+        webserver.validate_file_path("linked-input.json")
+
+    assert exc_info.value.status_code == 400
+    assert "stay inside CUOPT_DATA_DIR" in exc_info.value.detail


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Cover the remaining new validation branches.

This suite never exercises the CUOPT_DATA_DIR-unset branch or the new "must be a regular file" branch. A regression in either path would still leave this PR green even though both are part of the new validator behavior.

Suggested tests

+def test_validate_file_path_rejects_when_data_dir_is_unset(): + settings.set_data_dir("") + + with pytest.raises(HTTPException) as exc_info: + webserver.validate_file_path("input.json") + + assert exc_info.value.status_code == 400 + assert "data directory not set" in exc_info.value.detail + + +def test_validate_file_path_rejects_directory_inside_data_dir(tmp_path): + data_dir = tmp_path / "data" + nested_dir = data_dir / "subdir" + nested_dir.mkdir(parents=True) + settings.set_data_dir(str(data_dir)) + + with pytest.raises(HTTPException) as exc_info: + webserver.validate_file_path("subdir") + + assert exc_info.value.status_code == 400 + assert "specified data file does not exist" in exc_info.value.detail

As per coding guidelines, **/*test*.{cpp,py}: "Add test coverage for edge cases (empty, infeasible, unbounded, degenerate) when adding new code paths" and python/**/tests/**: "Regression coverage for fixed bugs".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt_server/cuopt_server/tests/test_file_path_validation.py` around lines 22 - 74, Add tests in test_file_path_validation.py to cover the two missing branches in webserver.validate_file_path: (1) the CUOPT_DATA_DIR-unset path by clearing/unsetting settings.set_data_dir (or not calling it) and calling validate_file_path to assert it raises the expected HTTPException/status and message about CUOPT_DATA_DIR being unset; and (2) the "must be a regular file" path by creating a non-regular target inside the data dir (e.g., a subdirectory or FIFO) and calling validate_file_path on that name to assert it raises HTTPException with the "regular file" (or similar) detail. Use the existing test helpers and exception assertions (pytest.raises and exc_info.value.detail) and reference validate_file_path and settings.set_data_dir so the new tests exercise those branches.

Source: Coding guidelines

coderabbitai · 2026-06-10T21:47:09Z

+    if not os.path.isfile(file_path):
+        logging.error(f"File path '{file_path}' doesn't exist")
        raise HTTPException(
            status_code=400,
-            detail="unable to read "
-            "optimization data from file %s, %s" % (file_path, str(e)),
+            detail=f"specified data file does not exist: {cuopt_data_file}",


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid logging the resolved server path for missing files.

Line 217 logs file_path, which includes the fully resolved CUOPT_DATA_DIR path for a user-controlled miss. That leaks internal filesystem layout into server logs. Log cuopt_data_file or a generic message instead.

Suggested change

- logging.error(f"File path '{file_path}' doesn't exist") + logging.error( + "Requested cuopt data file '%s' does not exist inside CUOPT_DATA_DIR", + cuopt_data_file, + )

As per coding guidelines, python/cuopt_server/**: "No credential or internal-path leakage in error responses or logs".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt_server/cuopt_server/webserver.py` around lines 216 - 220, The log call leaks the resolved internal filesystem path via file_path; update the error logging in webserver.py so it does not include file_path (use cuopt_data_file or a generic message instead) and ensure any HTTPException detail or other logs never expose the resolved server path; locate the check using file_path and cuopt_data_file and replace the logging.error(f"File path '{file_path}'...") with a safe message that references only cuopt_data_file or a non-path generic string.

Source: Coding guidelines

Constrain server data file paths

0c5676c

fallintoplace requested a review from a team as a code owner June 10, 2026 21:42

fallintoplace requested a review from rgsl888prabhu June 10, 2026 21:42

coderabbitai Bot reviewed Jun 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constrain server data file paths#1423

Constrain server data file paths#1423
fallintoplace wants to merge 1 commit into
NVIDIA:mainfrom
fallintoplace:fix/server-data-file-path-boundary

fallintoplace commented Jun 10, 2026

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

coderabbitai Bot Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fallintoplace commented Jun 10, 2026

Summary

Validation

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

coderabbitai Bot commented Jun 10, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant