Constrain server data file paths#1423
Conversation
📝 WalkthroughWalkthroughThis PR rewrites the ChangesFile Path Validation Security
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/cuopt_server/cuopt_server/webserver.py (1)
1047-1052:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftThe boundary check is still bypassable via TOCTOU.
Line 1051 validates the pathname, but Lines 1142-1154 enqueue only the string for later use in another worker. If
CUOPT_DATA_DIRis writable by an untrusted process, the validated file can be replaced with a different file or symlink after the check and before the solver opens it, which reintroduces the escape this PR is trying to close. Open the file here and pass an fd/bytes, or re-resolve and validate again at the actual open site.Also applies to: 1142-1154
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt_server/cuopt_server/webserver.py` around lines 1047 - 1052, The code validates the pathname via validate_file_path(cuopt_data_file) and then later enqueues only the string (file_path), leaving a TOCTOU window if CUOPT_DATA_DIR is writable; fix by opening the file immediately after validate_file_path (e.g., obtain a file descriptor or read bytes) and pass that safe handle/data into the queue instead of the filename, or if you must enqueue a path, re-resolve and re-run validate_file_path at the actual open site (the worker that processes the queue, i.e., where you currently handle lines ~1142-1154) and ensure you use openat with the directory fd or verify inode/dev after open to prevent symlink/race attacks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/cuopt_server/cuopt_server/tests/test_file_path_validation.py`:
- Around line 22-74: Add tests in test_file_path_validation.py to cover the two
missing branches in webserver.validate_file_path: (1) the CUOPT_DATA_DIR-unset
path by clearing/unsetting settings.set_data_dir (or not calling it) and calling
validate_file_path to assert it raises the expected HTTPException/status and
message about CUOPT_DATA_DIR being unset; and (2) the "must be a regular file"
path by creating a non-regular target inside the data dir (e.g., a subdirectory
or FIFO) and calling validate_file_path on that name to assert it raises
HTTPException with the "regular file" (or similar) detail. Use the existing test
helpers and exception assertions (pytest.raises and exc_info.value.detail) and
reference validate_file_path and settings.set_data_dir so the new tests exercise
those branches.
In `@python/cuopt_server/cuopt_server/webserver.py`:
- Around line 216-220: The log call leaks the resolved internal filesystem path
via file_path; update the error logging in webserver.py so it does not include
file_path (use cuopt_data_file or a generic message instead) and ensure any
HTTPException detail or other logs never expose the resolved server path; locate
the check using file_path and cuopt_data_file and replace the
logging.error(f"File path '{file_path}'...") with a safe message that references
only cuopt_data_file or a non-path generic string.
---
Outside diff comments:
In `@python/cuopt_server/cuopt_server/webserver.py`:
- Around line 1047-1052: The code validates the pathname via
validate_file_path(cuopt_data_file) and then later enqueues only the string
(file_path), leaving a TOCTOU window if CUOPT_DATA_DIR is writable; fix by
opening the file immediately after validate_file_path (e.g., obtain a file
descriptor or read bytes) and pass that safe handle/data into the queue instead
of the filename, or if you must enqueue a path, re-resolve and re-run
validate_file_path at the actual open site (the worker that processes the queue,
i.e., where you currently handle lines ~1142-1154) and ensure you use openat
with the directory fd or verify inode/dev after open to prevent symlink/race
attacks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: cadfe213-c21f-4a01-af0b-1bfba8893d48
📒 Files selected for processing (2)
python/cuopt_server/cuopt_server/tests/test_file_path_validation.pypython/cuopt_server/cuopt_server/webserver.py
| def test_validate_file_path_returns_file_if_relative_path_stays_in_data_dir( | ||
| tmp_path, | ||
| ): | ||
| data_dir = tmp_path / "data" | ||
| data_dir.mkdir() | ||
| data_file = data_dir / "input.json" | ||
| data_file.write_text("{}", encoding="utf-8") | ||
| settings.set_data_dir(str(data_dir)) | ||
|
|
||
| assert webserver.validate_file_path("input.json") == str(data_file) | ||
|
|
||
|
|
||
| def test_validate_file_path_rejects_absolute_path(tmp_path): | ||
| data_dir = tmp_path / "data" | ||
| data_dir.mkdir() | ||
| outside_file = tmp_path / "input.json" | ||
| outside_file.write_text("{}", encoding="utf-8") | ||
| settings.set_data_dir(str(data_dir)) | ||
|
|
||
| with pytest.raises(HTTPException) as exc_info: | ||
| webserver.validate_file_path(str(outside_file)) | ||
|
|
||
| assert exc_info.value.status_code == 400 | ||
| assert "relative to CUOPT_DATA_DIR" in exc_info.value.detail | ||
|
|
||
|
|
||
| def test_validate_file_path_rejects_parent_directory_escape(tmp_path): | ||
| data_dir = tmp_path / "data" | ||
| data_dir.mkdir() | ||
| outside_file = tmp_path / "input.json" | ||
| outside_file.write_text("{}", encoding="utf-8") | ||
| settings.set_data_dir(str(data_dir)) | ||
|
|
||
| with pytest.raises(HTTPException) as exc_info: | ||
| webserver.validate_file_path("../input.json") | ||
|
|
||
| assert exc_info.value.status_code == 400 | ||
| assert "stay inside CUOPT_DATA_DIR" in exc_info.value.detail | ||
|
|
||
|
|
||
| def test_validate_file_path_rejects_symlink_escape(tmp_path): | ||
| data_dir = tmp_path / "data" | ||
| data_dir.mkdir() | ||
| outside_file = tmp_path / "input.json" | ||
| outside_file.write_text("{}", encoding="utf-8") | ||
| os.symlink(outside_file, data_dir / "linked-input.json") | ||
| settings.set_data_dir(str(data_dir)) | ||
|
|
||
| with pytest.raises(HTTPException) as exc_info: | ||
| webserver.validate_file_path("linked-input.json") | ||
|
|
||
| assert exc_info.value.status_code == 400 | ||
| assert "stay inside CUOPT_DATA_DIR" in exc_info.value.detail |
There was a problem hiding this comment.
Cover the remaining new validation branches.
This suite never exercises the CUOPT_DATA_DIR-unset branch or the new "must be a regular file" branch. A regression in either path would still leave this PR green even though both are part of the new validator behavior.
Suggested tests
+def test_validate_file_path_rejects_when_data_dir_is_unset():
+ settings.set_data_dir("")
+
+ with pytest.raises(HTTPException) as exc_info:
+ webserver.validate_file_path("input.json")
+
+ assert exc_info.value.status_code == 400
+ assert "data directory not set" in exc_info.value.detail
+
+
+def test_validate_file_path_rejects_directory_inside_data_dir(tmp_path):
+ data_dir = tmp_path / "data"
+ nested_dir = data_dir / "subdir"
+ nested_dir.mkdir(parents=True)
+ settings.set_data_dir(str(data_dir))
+
+ with pytest.raises(HTTPException) as exc_info:
+ webserver.validate_file_path("subdir")
+
+ assert exc_info.value.status_code == 400
+ assert "specified data file does not exist" in exc_info.value.detailAs per coding guidelines, **/*test*.{cpp,py}: "Add test coverage for edge cases (empty, infeasible, unbounded, degenerate) when adding new code paths" and python/**/tests/**: "Regression coverage for fixed bugs".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuopt_server/cuopt_server/tests/test_file_path_validation.py` around
lines 22 - 74, Add tests in test_file_path_validation.py to cover the two
missing branches in webserver.validate_file_path: (1) the CUOPT_DATA_DIR-unset
path by clearing/unsetting settings.set_data_dir (or not calling it) and calling
validate_file_path to assert it raises the expected HTTPException/status and
message about CUOPT_DATA_DIR being unset; and (2) the "must be a regular file"
path by creating a non-regular target inside the data dir (e.g., a subdirectory
or FIFO) and calling validate_file_path on that name to assert it raises
HTTPException with the "regular file" (or similar) detail. Use the existing test
helpers and exception assertions (pytest.raises and exc_info.value.detail) and
reference validate_file_path and settings.set_data_dir so the new tests exercise
those branches.
Source: Coding guidelines
| if not os.path.isfile(file_path): | ||
| logging.error(f"File path '{file_path}' doesn't exist") | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="unable to read " | ||
| "optimization data from file %s, %s" % (file_path, str(e)), | ||
| detail=f"specified data file does not exist: {cuopt_data_file}", |
There was a problem hiding this comment.
Avoid logging the resolved server path for missing files.
Line 217 logs file_path, which includes the fully resolved CUOPT_DATA_DIR path for a user-controlled miss. That leaks internal filesystem layout into server logs. Log cuopt_data_file or a generic message instead.
Suggested change
- logging.error(f"File path '{file_path}' doesn't exist")
+ logging.error(
+ "Requested cuopt data file '%s' does not exist inside CUOPT_DATA_DIR",
+ cuopt_data_file,
+ )As per coding guidelines, python/cuopt_server/**: "No credential or internal-path leakage in error responses or logs".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/cuopt_server/cuopt_server/webserver.py` around lines 216 - 220, The
log call leaks the resolved internal filesystem path via file_path; update the
error logging in webserver.py so it does not include file_path (use
cuopt_data_file or a generic message instead) and ensure any HTTPException
detail or other logs never expose the resolved server path; locate the check
using file_path and cuopt_data_file and replace the logging.error(f"File path
'{file_path}'...") with a safe message that references only cuopt_data_file or a
non-path generic string.
Source: Coding guidelines
Summary
The local-file request path for
cuopt-data-filenow enforces the documentedCUOPT_DATA_DIRboundary.Previously the server joined
CUOPT_DATA_DIRwith the header value and only checked existence. Absolute paths and..segments could resolve outside the configured data directory, and symlinks could do the same.This change rejects absolute input paths, resolves both the configured root and requested file with
realpath, verifies the result stays underCUOPT_DATA_DIR, and requires the target to be a regular file.Validation
python3 -m py_compile python/cuopt_server/cuopt_server/webserver.py python/cuopt_server/cuopt_server/tests/test_file_path_validation.pygit diff --checkuvx --with ruff ruff check python/cuopt_server/cuopt_server/webserver.py python/cuopt_server/cuopt_server/tests/test_file_path_validation.pyPYTHONPATH=python/cuopt_server uvx --with 'pytest<9.0' --with fastapi --with jsonref==1.1.0 --with msgpack==1.1.2 --with msgpack-numpy==0.4.8 --with psutil --with uvicorn --with numpy --with pandas pytest python/cuopt_server/cuopt_server/tests/test_file_path_validation.py -qThe pytest run passed with existing Pydantic deprecation warnings during server import.