feat(client): share adaptive concurrency controllers per (session, endpoint)#871
Open
assafvayner wants to merge 3 commits into
Open
feat(client): share adaptive concurrency controllers per (session, endpoint)#871assafvayner wants to merge 3 commits into
assafvayner wants to merge 3 commits into
Conversation
AdaptiveConcurrencyController only ever read ctx.config, but storing the full XetContext gave cached controllers a strong Arc<XetCommon> back-edge: XetCommon -> runtime_cache -> controller -> ctx.common, keeping every session's XetCommon alive for the process lifetime. Store Arc<XetConfig> instead, and pin the property with a Weak-upgrade regression test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Human Summary:
Having all downloads and all uploads share the same concurrency controller within the same XetSession (all download groups and all upload commit builders under the same session)
2 download groups or 2 upload commits concurrently share the same semaphores for concurrently downloading segments/uploading objects. more importantly they cannot interfere with each other.
when 1 group/upload finishes, if another one starts, then next will have the previous adaptive concurrency state as the baseline so no need for a "ramp back up" (or down).
uses the dynamic cache system to allocate the adaptors. adaptor is "per endpoint" in the very rare chance we end up hitting 2 separate endpoints.
minor refactor required, the adaptive concurrency held a XetContext Arc which under the new additions would cause a circular Arc dependency (memory leak guaranteed). Only the XetConfig was used so we add the xet config instead of XetContext.
Summary
AdaptiveConcurrencyControllers (xet_client/src/cas_client/adaptive_concurrency/cache.rs), stored via the existingXetCommon::cache_get_or_createmechanism.RemoteClient::new_with_socketnow fetches its upload/download controllers from this cache instead of constructing fresh ones, so all upload commits, file download groups, and download stream groups created from oneXetSessionand targeting the same CAS endpoint share one adaptive concurrency model and one concurrency limit (uploads and downloads remain independent).AdaptiveConcurrencyControllernow storesArc<XetConfig>instead of a fullXetContext. Since the cache lives insideXetCommon, a controller holdingXetContext(which ownsArc<XetCommon>) would form a strong reference cycle that keeps each session'sXetCommonalive for the process lifetime. The controller only ever readctx.config, so this drops the over-broad dependency; a regression test pins the property (drop the context, assertWeak<XetCommon>no longer upgrades).Behavior change
Previously each
XetUploadCommit/ download group builder ramped its own controller independently, so a session's effective connection ceiling wasN builders × ac_max_*_concurrencywith the controllers competing blindly. Nowac_max_upload_concurrency/ac_max_download_concurrencybound the total in-flight requests per (session, endpoint), and later builders warm-start from the already-learned concurrency level instead of cold-starting.Notes:
max_concurrent_file_ingestion/max_concurrent_file_downloads) were already session-shared viaXetCommonand are untouched.LocalClient/MemoryClientintentionally keep per-instance controllers.local_servertest path creates twoRemoteClients on one ctx+endpoint and now shares controllers between them; audited — no test asserts independent ramping.Test Plan
Arc::ptr_eq), different endpoint distinct, different ctx distinct, upload vs download distinctXetCommonalive (noArccycle)RemoteClienttest covering the sharing matrix end-to-end through the constructor, including non-eviction when a second endpoint is addedcargo test -p xet-client: 138 passed, 0 failed (4 network tests ignored as usual)cargo check -p xet-data -p hf-xetclean;cargo +nightly fmt+cargo clippy -p xet-client --all-targetsshow no warnings in changed filesNote
Medium Risk
Changes how concurrent uploads/downloads compete within a session (global semaphore per direction/endpoint instead of per
RemoteClient), which can alter throughput and ramp-up behavior for multi-group workloads.Overview
Session-wide adaptive concurrency for
RemoteClient: upload and downloadAdaptiveConcurrencyControllerinstances are now looked up from a new endpoint-keyed cache onXetCommon(cache.rs) instead of being created per client. All clients in the sameXetSessionand CAS endpoint share one upload and one download controller (separate maps), soac_max_*_concurrencycaps total in-flight CAS requests and later work reuses the learned limit instead of cold-starting per upload commit or download group.Refactor to avoid leaks:
AdaptiveConcurrencyControllerstoresArc<XetConfig>instead ofXetContext, because caching controllers insideXetCommonwould otherwise create anArccycle (XetContext→XetCommon→ controller →XetContext). Tests assert sharing by ctx/endpoint, non-eviction when adding a second endpoint, and that dropping the context releasesXetCommon.Reviewed by Cursor Bugbot for commit 22b55fe. Bugbot is set up for automated code reviews on this repo. Configure here.