-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Data] [Core] [4/n] Switch ResourceManager OSM Estimation to use BlockRefCounter #64192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rayhhome
wants to merge
62
commits into
ray-project:master
Choose a base branch
from
rayhhome:block-ref-counter-resource
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
ff9f5e4
c++ side changes
rayhhome 5c3780a
cython layer changes
rayhhome 00b8495
Merge branch 'master' into callback-core-wiring
rayhhome eb668f8
Address reviews
rayhhome 2ec8d81
Merge branch 'master' into callback-core-wiring
rayhhome 711d5a3
Fix ObjectRef paramemter
rayhhome 06a657d
Set in_core_worker to avoid void decrement of objectref refcount
rayhhome c534154
AddObjectOutOfScopeOrFreedCallback made public
rayhhome a7662dc
Merge branch 'master' into callback-core-wiring
rayhhome 5e624fe
Add test to bazel file
rayhhome c365b4f
Merge branch 'master' into callback-core-wiring
rayhhome 8f422f7
Fix test
rayhhome b673e05
Merge branch 'callback-core-wiring' of github.com:rayhhome/ray into c…
rayhhome d325891
Merge branch 'master' into callback-core-wiring
rayhhome 33ed991
Address Shutdown edge case
rayhhome f830175
Merge branch 'callback-core-wiring' of github.com:rayhhome/ray into c…
rayhhome 1adeed4
Address reviews
rayhhome 71e5ee4
Merge branch 'master' of github.com:ray-project/ray into callback-cor…
rayhhome 738dcaa
Address comments again
rayhhome 6191d3a
Merge branch 'master' into callback-core-wiring
rayhhome 2f43a3e
Address reviews
rayhhome 5e6d39e
Merge branch 'master' into callback-core-wiring
rayhhome 3fa9b00
Address comments and ameliorate codebase
rayhhome 78d4a23
Add callback latency measuring tests
rayhhome ebecd5a
Merge branch 'master' into callback-core-wiring
rayhhome a37a262
Put ref registration assertion in test scoped
rayhhome 96f38b7
Merge branch 'master' into callback-core-wiring
rayhhome 0280347
Fix ray_perf.py unfired callback
rayhhome 7903d34
Scaled-up callback throughput benchmark
rayhhome 8b22ae9
Bump up thread count due to new cleanup thread
rayhhome 06d491f
Merge branch 'master' of github.com:ray-project/ray into callback-cor…
rayhhome 16eca1f
Remove 20 from thread count since we net increased thread count
rayhhome 79fb62f
Merge branch 'master' into callback-core-wiring
rayhhome 13f4e46
Integrate more realistic benchmark
rayhhome eafbdef
Merge branch 'master' into callback-core-wiring
rayhhome 72ae1b3
Add BlockRefCounter Implementation and Tests
rayhhome 8012534
Address comments and update tests
rayhhome 8c8b649
Make on_block_produced idempotent
rayhhome f9812f0
Address comments
rayhhome e759d70
Pyrefly fixes + move benchmark
rayhhome 0b2bb38
Fix pyrefly again
rayhhome 98a3552
Address edge case of unowned blocks in SplitCoordinator
rayhhome 46fadc6
wrap block_ref in counter actor
rayhhome 30aeb9f
Comments improvement
rayhhome 1547a61
Wire blockRefCounter through operators
rayhhome a677cca
Add missing type notations + missing hash shuffle change
rayhhome 44d4080
Address comments + Make BlockRefCounter mandatory in call chains
rayhhome e601d8a
Track blocks for limit, zip, and output splitter
rayhhome f4fef45
Track blocks for aggregate num rows
rayhhome 98ef994
Track blocks for shuffle reduce
rayhhome c5a12d9
simplify shuffle reduce memory tracking logic
rayhhome efe4579
Add argument for blockRefCounter
rayhhome 40f1153
Add block_ref_counter argument to build_streaming_topology
rayhhome 57a15c3
Remove duplicate start call mock_all_to_all_op
rayhhome 6350a1e
Fix pyrefly
rayhhome 60bf01f
Address missed start calls
rayhhome 9650f83
Address start argument changes
rayhhome 2b185dd
Adjust resource manager object store memory tracking logic
rayhhome 7647790
Address reviews + remove implementation detail descriptions from test…
rayhhome ed666df
Also address resource manager unit test by switching to StubBlockRefC…
rayhhome 285c850
Remove dead code
rayhhome 2edfb95
Fix StubBlockRefCounter argument
rayhhome File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Some comments aren't visible on the classic Files Changed page.
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,86 @@ | ||
| import threading | ||
| from collections import defaultdict | ||
| from typing import Callable, Dict, Optional | ||
|
|
||
| import ray | ||
| from ray._private.worker import global_worker | ||
|
|
||
|
|
||
| class BlockRefCounter: | ||
| """Tracks object-store memory usage per operator via Ray Core callbacks. | ||
|
|
||
| The callback fires when: | ||
| - All Python ObjectRefs wrapping the block's ObjectID are garbage-collected, AND | ||
| - All Ray tasks that received the block as an argument have completed. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| add_object_out_of_scope_callback: Optional[ | ||
| Callable[["ray.ObjectRef", Callable[[bytes], None]], bool] | ||
| ] = None, | ||
| ): | ||
| if add_object_out_of_scope_callback is None: | ||
| add_object_out_of_scope_callback = ( | ||
| global_worker.core_worker.add_object_out_of_scope_callback # pyrefly: ignore[missing-attribute] | ||
| ) | ||
| self._add_callback_fn = add_object_out_of_scope_callback | ||
| # IDs of live blocks. Stale callbacks (fired after clear()) check | ||
| # membership here and no-op, preventing negative _bytes_by_producer. | ||
| self._registered_ids: set[bytes] = set() | ||
| # (producer_id -> total live bytes); maintained incrementally for O(1) reads. | ||
| self._bytes_by_producer: Dict[str, int] = defaultdict(int) | ||
| self._lock = threading.Lock() | ||
|
|
||
| def on_block_produced( | ||
| self, | ||
| block_ref: "ray.ObjectRef", | ||
| size_bytes: int, | ||
| producer_id: str, | ||
| ) -> None: | ||
| """Register a block and attribute its memory to producer_id. | ||
|
|
||
| Registers a Ray Core out-of-scope callback so that when all references | ||
| to block_ref are gone the bytes are automatically removed from the | ||
| producer's usage. | ||
|
|
||
| Idempotent: calling twice with the same block_ref is a no-op. | ||
| """ | ||
| id_binary = block_ref.binary() | ||
| with self._lock: | ||
| if id_binary in self._registered_ids: | ||
| return | ||
| self._registered_ids.add(id_binary) | ||
| self._bytes_by_producer[producer_id] += size_bytes | ||
|
|
||
| def _on_object_freed(id_bytes: bytes) -> None: | ||
| with self._lock: | ||
| if id_bytes not in self._registered_ids: | ||
| # Already cleared (e.g. by clear()), nothing to do. | ||
| return | ||
| self._registered_ids.discard(id_bytes) | ||
| self._bytes_by_producer[producer_id] -= size_bytes | ||
|
|
||
| try: | ||
| registered = self._add_callback_fn(block_ref, _on_object_freed) | ||
| except ValueError: | ||
| # Block not owned by this worker; can't track it. | ||
| _on_object_freed(id_binary) | ||
| return | ||
| if not registered: | ||
| _on_object_freed(id_binary) | ||
|
rayhhome marked this conversation as resolved.
|
||
|
|
||
| def get_object_store_memory_usage(self, producer_id: str) -> int: | ||
| """Total bytes of live blocks attributed to producer_id.""" | ||
| with self._lock: | ||
| return self._bytes_by_producer.get(producer_id, 0) | ||
|
|
||
| def clear(self) -> None: | ||
| """Reset all accounting, e.g. on executor shutdown. | ||
|
|
||
| Any previously registered Ray Core callbacks firing after clear() | ||
| will be silently ignored because _registered_ids is empty. | ||
| """ | ||
| with self._lock: | ||
| self._registered_ids.clear() | ||
| self._bytes_by_producer.clear() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.