Skip to content

bound array access in StoragePath url parsers#16243

Open
alhudz wants to merge 2 commits into
firebase:mainfrom
alhudz:storage-path-index-bounds
Open

bound array access in StoragePath url parsers#16243
alhudz wants to merge 2 commits into
firebase:mainfrom
alhudz:storage-path-index-bounds

Conversation

@alhudz

@alhudz alhudz commented Jun 9, 2026

Copy link
Copy Markdown

StoragePath parses gs:// and http[s]:// Storage URLs reached via Storage.reference(forURL:). Those strings frequently come from another user (download URLs persisted in a database, deep links, share flows), so a crafted value crashes the consuming app.

Repro: try StoragePath.path(string: "http://firebasestorage.googleapis.com/v0/b/bucket/o") (the /o object marker with no object path), or try StoragePath.path(string: "gs:///").
Expected: a bucket-only path, or a thrown invalid-URI error.
Actual: Fatal error: Index out of range.
Cause: path(HTTPURL:) reads pathComponents[5] after guarding only count > 4, so the bare /o form (5 components) indexes past the end; path(GSURI:) reads splitStringArray[0], but split(separator: "/", maxSplits: 1) returns an empty array when the remainder is only slashes.
Fix: require count > 5 before reading the object component, and read the bucket via splitStringArray.first. Existing parses are unchanged; the bare /o form now resolves to the bucket root and an all-slash gs:// input throws the existing error.

@gemini-code-assist

Copy link
Copy Markdown
Contributor
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

@paulb777

Copy link
Copy Markdown
Member

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request improves URL parsing in StoragePath by safely unwrapping the bucket name from GS URIs and updating the path component count check for HTTP URLs to prevent out-of-bounds indexing when the object path is missing. Unit tests were also added to cover these edge cases. The reviewer suggested validating that the fifth path component in HTTP URLs is indeed the expected "o" marker to prevent parsing malformed URLs, and recommended using splitStringArray.dropFirst().first for a cleaner, more idiomatic way to retrieve the optional object path.

Comment on lines +93 to 95
guard pathComponents.count > 5 else {
return StoragePath(with: bucketName)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the URL contains more than 5 path components (meaning it has an object path), we should validate that the 5th component (pathComponents[4]) is indeed the expected "o" marker. This prevents parsing malformed or invalid URLs that do not follow the Firebase Storage URL structure.

    guard pathComponents.count > 5 else {
      return StoragePath(with: bucketName)
    }
    guard pathComponents[4] == "o" else {
      throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
        "http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
    }

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added the pathComponents[4] == "o" guard before reading the object component, so a non-o segment in that position now throws instead of being parsed as an object. Pushed, with a testHTTPURLThrowsOnMissingObjectMarker regression.

Comment on lines +61 to +64
if let bucketName = splitStringArray.first {
let object = splitStringArray.count == 2 ? splitStringArray[1] : nil
return StoragePath(with: bucketName, object: object)
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Instead of checking splitStringArray.count == 2 and accessing the index splitStringArray[1], you can use splitStringArray.dropFirst().first to safely and idiomatically retrieve the optional object path. This is cleaner and avoids hardcoded index checks.

Suggested change
if let bucketName = splitStringArray.first {
let object = splitStringArray.count == 2 ? splitStringArray[1] : nil
return StoragePath(with: bucketName, object: object)
}
if let bucketName = splitStringArray.first {
let object = splitStringArray.dropFirst().first
return StoragePath(with: bucketName, object: object)
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, switched to splitStringArray.dropFirst().first. Drops the count == 2 check and the hardcoded [1] index.

@alhudz

alhudz commented Jun 16, 2026

Copy link
Copy Markdown
Author

Addressed both review points: added the /o marker guard in the HTTP parser and switched the GS object lookup to dropFirst().first. Existing parses are unchanged and there's a regression test for the new marker case.

@paulb777

Copy link
Copy Markdown
Member

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors URL parsing in StoragePath to improve safety and correctness when handling GS URIs and HTTP URLs, and adds corresponding unit tests. The review feedback identifies a bug where invalid HTTP URLs with exactly five path components bypass the 'o' marker validation and are incorrectly parsed as valid bucket-only paths. It suggests restructuring the guard checks to enforce validation before checking for an object path, and adding a unit test to cover this specific edge case.

Comment on lines +93 to +99
guard pathComponents.count > 5 else {
return StoragePath(with: bucketName)
}
guard pathComponents[4] == "o" else {
throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
"http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If pathComponents.count is exactly 5 (e.g., http://firebasestorage.googleapis.com/v0/b/bucket/x), the guard condition pathComponents.count > 5 evaluates to false. This causes the function to immediately return StoragePath(with: bucketName) without validating whether pathComponents[4] is "o". As a result, invalid URLs with 5 path components where the 5th component is not "o" will be incorrectly parsed as valid bucket-only paths instead of throwing an error.

To fix this, we should first check if pathComponents.count > 4 and validate that pathComponents[4] == "o" before checking if there is an object path (i.e., pathComponents.count > 5).

    guard pathComponents.count > 4 else {
      return StoragePath(with: bucketName)
    }
    guard pathComponents[4] == "o" else {
      throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
        "http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
    }
    guard pathComponents.count > 5 else {
      return StoragePath(with: bucketName)
    }

Comment on lines +108 to +113
func testHTTPURLThrowsOnMissingObjectMarker() {
// A path segment in the "/o" marker position that isn't "o" is not a valid
// Storage URL and must throw rather than be parsed as an object.
let httpURL = "http://firebasestorage.googleapis.com/v0/b/bucket/x/path/to/object"
XCTAssertThrowsError(try StoragePath.path(string: httpURL))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

To prevent regressions, it would be beneficial to also test the case where the URL has exactly 5 path components but the 5th component is not "o" (e.g., http://firebasestorage.googleapis.com/v0/b/bucket/x).

  func testHTTPURLThrowsOnMissingObjectMarker() {
    // A path segment in the "/o" marker position that isn't "o" is not a valid
    // Storage URL and must throw rather than be parsed as an object.
    let httpURL = "http://firebasestorage.googleapis.com/v0/b/bucket/x/path/to/object"
    XCTAssertThrowsError(try StoragePath.path(string: httpURL))

    let shortHttpURL = "http://firebasestorage.googleapis.com/v0/b/bucket/x"
    XCTAssertThrowsError(try StoragePath.path(string: shortHttpURL))
  }

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the parsing logic of StoragePath for both GS URIs and HTTP URLs to improve safety and correctness, accompanied by new unit tests. The reviewer identified an inconsistency where invalid URLs with exactly 5 path components (e.g., ending with an invalid marker like '/x') bypass validation and return a valid StoragePath, and provided a code suggestion to enforce strict validation of the object marker.

Comment on lines +93 to +99
guard pathComponents.count > 5 else {
return StoragePath(with: bucketName)
}
guard pathComponents[4] == "o" else {
throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
"http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is an inconsistency in how invalid URLs are handled. If a URL has 5 path components and the 5th component is invalid (e.g., http://firebasestorage.googleapis.com/v0/b/bucket/x), the current implementation will skip the check and return a valid StoragePath with the bucket name. However, if the URL has 6 path components and the 5th component is invalid (e.g., .../bucket/x/object), it will correctly throw an error.

We should validate the 5th component (the object marker) even when the path components count is exactly 5, allowing only "o" or a trailing slash "/".

    guard pathComponents.count > 4 else {
      return StoragePath(with: bucketName)
    }
    let marker = pathComponents[4]
    if marker == "o" {
      guard pathComponents.count > 5 else {
        return StoragePath(with: bucketName)
      }
    } else if marker == "/" && pathComponents.count == 5 {
      return StoragePath(with: bucketName)
    } else {
      throw StoragePathError.storagePathError("Internal error: URL must be in the form of " +
        "http[s]://<host>/v0/b/<bucket>/o/<path/to/object>[?token=signed_url_params]")
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants