fix(storage): retry throttled fts metadata listing#6994
Merged
Conversation
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Xuanwo
approved these changes
May 29, 2026
Collaborator
|
It is possible for us to not rely listing? |
Contributor
Author
|
let's remove listing in next PR! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug Fix
What is the bug?
Distributed FTS indexing can encounter transient object-store list failures during partition metadata discovery, especially Azure
ServerBusyand account egress-limit responses. These failures were not retried consistently on the Lance side, and FTS metadata listing could swallow list stream errors and later report the misleading errorNo partition metadata files found.What issues or incorrect behavior does the bug cause?
Transient cloud throttling can fail FTS index builds instead of backing off and resuming. When the list failure happens while discovering partition metadata, the user can lose the original service error body, request ID, or throttle reason and see an empty-directory error instead. List streams also did not feed the AIMD limiter before the underlying object-store list request started.
How does this PR fix the problem?
listandlist_with_offsetdelegate streams, then continue observing yielded items and errors.ServerBusy, account egress limits, HTTP 429, and known rate-limit phrases.client_max_retries=0AIMD-disable guard for AWS, Azure, and GCP. Users should setlance_aimd_max_retries=0to explicitly disable Lance AIMD throttling.No public Rust, Python, or Java APIs are added.
Validation
cargo test -p lance-io list_retryCARGO_TARGET_DIR=/tmp/lance-fc2c-target cargo test -p lance-io throttleCARGO_TARGET_DIR=/tmp/lance-fc2c-target cargo test -p lance-index list_metadata_filesCARGO_TARGET_DIR=/tmp/lance-fc2c-target cargo test -p lance merge_existing_index_segments_supports_fts_segmentscargo fmt --allCARGO_TARGET_DIR=/tmp/lance-fc2c-target cargo clippy --all --tests --benches -- -D warnings