bound array access in StoragePath url parsers#16243
Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request improves URL parsing in StoragePath by safely unwrapping the bucket name from GS URIs and updating the path component count check for HTTP URLs to prevent out-of-bounds indexing when the object path is missing. Unit tests were also added to cover these edge cases. The reviewer suggested validating that the fifth path component in HTTP URLs is indeed the expected "o" marker to prevent parsing malformed URLs, and recommended using splitStringArray.dropFirst().first for a cleaner, more idiomatic way to retrieve the optional object path.
| guard pathComponents.count > 5 else { | ||
| return StoragePath(with: bucketName) | ||
| } |
There was a problem hiding this comment.
When the URL contains more than 5 path components (meaning it has an object path), we should validate that the 5th component (pathComponents[4]) is indeed the expected "o" marker. This prevents parsing malformed or invalid URLs that do not follow the Firebase Storage URL structure.
guard pathComponents.count > 5 else {
return StoragePath(with: bucketName)
}
guard pathComponents[4] == "o" else {
throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
"http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
}There was a problem hiding this comment.
Good catch. Added the pathComponents[4] == "o" guard before reading the object component, so a non-o segment in that position now throws instead of being parsed as an object. Pushed, with a testHTTPURLThrowsOnMissingObjectMarker regression.
| if let bucketName = splitStringArray.first { | ||
| let object = splitStringArray.count == 2 ? splitStringArray[1] : nil | ||
| return StoragePath(with: bucketName, object: object) | ||
| } |
There was a problem hiding this comment.
Instead of checking splitStringArray.count == 2 and accessing the index splitStringArray[1], you can use splitStringArray.dropFirst().first to safely and idiomatically retrieve the optional object path. This is cleaner and avoids hardcoded index checks.
| if let bucketName = splitStringArray.first { | |
| let object = splitStringArray.count == 2 ? splitStringArray[1] : nil | |
| return StoragePath(with: bucketName, object: object) | |
| } | |
| if let bucketName = splitStringArray.first { | |
| let object = splitStringArray.dropFirst().first | |
| return StoragePath(with: bucketName, object: object) | |
| } |
There was a problem hiding this comment.
Done, switched to splitStringArray.dropFirst().first. Drops the count == 2 check and the hardcoded [1] index.
|
Addressed both review points: added the |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors URL parsing in StoragePath to improve safety and correctness when handling GS URIs and HTTP URLs, and adds corresponding unit tests. The review feedback identifies a bug where invalid HTTP URLs with exactly five path components bypass the 'o' marker validation and are incorrectly parsed as valid bucket-only paths. It suggests restructuring the guard checks to enforce validation before checking for an object path, and adding a unit test to cover this specific edge case.
| guard pathComponents.count > 5 else { | ||
| return StoragePath(with: bucketName) | ||
| } | ||
| guard pathComponents[4] == "o" else { | ||
| throw StoragePathError.storagePathError("Internal error: URL must be in the form of " + | ||
| "http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]") | ||
| } |
There was a problem hiding this comment.
If pathComponents.count is exactly 5 (e.g., http://firebasestorage.googleapis.com/v0/b/bucket/x), the guard condition pathComponents.count > 5 evaluates to false. This causes the function to immediately return StoragePath(with: bucketName) without validating whether pathComponents[4] is "o". As a result, invalid URLs with 5 path components where the 5th component is not "o" will be incorrectly parsed as valid bucket-only paths instead of throwing an error.
To fix this, we should first check if pathComponents.count > 4 and validate that pathComponents[4] == "o" before checking if there is an object path (i.e., pathComponents.count > 5).
guard pathComponents.count > 4 else {
return StoragePath(with: bucketName)
}
guard pathComponents[4] == "o" else {
throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
"http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
}
guard pathComponents.count > 5 else {
return StoragePath(with: bucketName)
}| func testHTTPURLThrowsOnMissingObjectMarker() { | ||
| // A path segment in the "/o" marker position that isn't "o" is not a valid | ||
| // Storage URL and must throw rather than be parsed as an object. | ||
| let httpURL = "http://firebasestorage.googleapis.com/v0/b/bucket/x/path/to/object" | ||
| XCTAssertThrowsError(try StoragePath.path(string: httpURL)) | ||
| } |
There was a problem hiding this comment.
To prevent regressions, it would be beneficial to also test the case where the URL has exactly 5 path components but the 5th component is not "o" (e.g., http://firebasestorage.googleapis.com/v0/b/bucket/x).
func testHTTPURLThrowsOnMissingObjectMarker() {
// A path segment in the "/o" marker position that isn't "o" is not a valid
// Storage URL and must throw rather than be parsed as an object.
let httpURL = "http://firebasestorage.googleapis.com/v0/b/bucket/x/path/to/object"
XCTAssertThrowsError(try StoragePath.path(string: httpURL))
let shortHttpURL = "http://firebasestorage.googleapis.com/v0/b/bucket/x"
XCTAssertThrowsError(try StoragePath.path(string: shortHttpURL))
}There was a problem hiding this comment.
Code Review
This pull request refactors the parsing logic of StoragePath for both GS URIs and HTTP URLs to improve safety and correctness, accompanied by new unit tests. The reviewer identified an inconsistency where invalid URLs with exactly 5 path components (e.g., ending with an invalid marker like '/x') bypass validation and return a valid StoragePath, and provided a code suggestion to enforce strict validation of the object marker.
| guard pathComponents.count > 5 else { | ||
| return StoragePath(with: bucketName) | ||
| } | ||
| guard pathComponents[4] == "o" else { | ||
| throw StoragePathError.storagePathError("Internal error: URL must be in the form of " + | ||
| "http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]") | ||
| } |
There was a problem hiding this comment.
There is an inconsistency in how invalid URLs are handled. If a URL has 5 path components and the 5th component is invalid (e.g., http://firebasestorage.googleapis.com/v0/b/bucket/x), the current implementation will skip the check and return a valid StoragePath with the bucket name. However, if the URL has 6 path components and the 5th component is invalid (e.g., .../bucket/x/object), it will correctly throw an error.
We should validate the 5th component (the object marker) even when the path components count is exactly 5, allowing only "o" or a trailing slash "/".
guard pathComponents.count > 4 else {
return StoragePath(with: bucketName)
}
let marker = pathComponents[4]
if marker == "o" {
guard pathComponents.count > 5 else {
return StoragePath(with: bucketName)
}
} else if marker == "/" && pathComponents.count == 5 {
return StoragePath(with: bucketName)
} else {
throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
"http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
}
StoragePathparsesgs://andhttp[s]://Storage URLs reached viaStorage.reference(forURL:). Those strings frequently come from another user (download URLs persisted in a database, deep links, share flows), so a crafted value crashes the consuming app.Repro:
try StoragePath.path(string: "http://firebasestorage.googleapis.com/v0/b/bucket/o")(the/oobject marker with no object path), ortry StoragePath.path(string: "gs:///").Expected: a bucket-only path, or a thrown invalid-URI error.
Actual:
Fatal error: Index out of range.Cause:
path(HTTPURL:)readspathComponents[5]after guarding onlycount > 4, so the bare/oform (5 components) indexes past the end;path(GSURI:)readssplitStringArray[0], butsplit(separator: "/", maxSplits: 1)returns an empty array when the remainder is only slashes.Fix: require
count > 5before reading the object component, and read the bucket viasplitStringArray.first. Existing parses are unchanged; the bare/oform now resolves to the bucket root and an all-slashgs://input throws the existing error.