Skip to content

[DNR] proto spill directly to disk#64377

Draft
owenowenisme wants to merge 1 commit into
ray-project:masterfrom
owenowenisme:claude/sleepy-diffie-a9f794
Draft

[DNR] proto spill directly to disk#64377
owenowenisme wants to merge 1 commit into
ray-project:masterfrom
owenowenisme:claude/sleepy-diffie-a9f794

Conversation

@owenowenisme

Copy link
Copy Markdown
Member

Thank you for contributing to Ray! 🚀
Please review the Ray Contribution Guide before opening a pull request.

⚠️ Remove these instructions before submitting your PR.

💡 Tip: Mark as draft if you want early feedback, or ready for review when it's complete.

Description

Briefly describe what this PR accomplishes and why it's needed.

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements direct (proactive) spilling for task return objects, allowing them to be written to external storage immediately after creation instead of waiting for memory pressure. This is achieved by adding a _spill_immediately option to tasks, which is plumbed from Python through Cython and the core worker to the raylet's PinObjectIDs RPC. The raylet then triggers immediate spilling via the LocalObjectManager. Feedback on the changes identifies a bug in the updated spill-timing calculation within LocalObjectManager::SpillObjectsInternal, where concurrent, out-of-order spill operations can lead to an underestimation of the total spill time. A transition-based tracking approach using the active worker count is recommended to resolve this issue.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +388 to 397
// Account for spill throughput here (shared by both the reactive
// threshold-based path and the direct/proactive spill path) so that
// all spills are reflected in spill_time_total_s_ and surfaced in
// stats. Use max(start, last_finish) to avoid double-counting
// concurrent spill operations.
auto now = absl::GetCurrentTimeNanos();
spill_time_total_s_ +=
(now - std::max(spill_start_time, last_spill_finish_ns_)) / 1e9;
last_spill_finish_ns_ = now;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Underestimation of Spill Time with Concurrent Spills

There is a subtle but significant bug in how spill_time_total_s_ is calculated when multiple spill operations run concurrently and finish out of order.

The Bug:

When a shorter spill (Spill B) starts after a longer spill (Spill A) but finishes before Spill A, it updates last_spill_finish_ns_ to its finish time. When Spill A subsequently finishes, std::max(spill_start_time, last_spill_finish_ns_) evaluates to Spill B's finish time. As a result, the entire active spilling duration of Spill A before Spill B finished is completely discarded, leading to a severe underestimation of the total spill time.

Concrete Example:

  1. t=0: Spill A starts (spill_start_time = 0).
  2. t=5: Spill B starts (spill_start_time = 5).
  3. t=8: Spill B finishes. spill_time_total_s_ increases by (8 - std::max(5, 0)) = 3 seconds. last_spill_finish_ns_ becomes 8.
  4. t=10: Spill A finishes. spill_time_total_s_ increases by (10 - std::max(0, 8)) = 2 seconds. last_spill_finish_ns_ becomes 10.

Total accounted spill time: 3 + 2 = 5 seconds.
Actual active spilling time: 10 seconds (from t=0 to t=10, spilling was continuously active).
The interval [0, 5] is completely lost.

Recommended Solution:

Instead of using the complex and error-prone std::max logic with last_spill_finish_ns_, we can leverage the existing num_active_workers_ counter to track the union of active spilling intervals:

  1. When num_active_workers_ transitions from 0 to 1 (i.e., the first spill starts), record the start time: last_spill_start_ns_ = absl::GetCurrentTimeNanos();.
  2. When num_active_workers_ transitions from 1 to 0 (i.e., the last active spill finishes), add the elapsed time to spill_time_total_s_:
    auto now = absl::GetCurrentTimeNanos();
    spill_time_total_s_ += (now - last_spill_start_ns_) / 1e9;

This transition-based approach is 100% accurate, handles any level of concurrency/overlap, and completely eliminates the out-of-order callback bug.

Adds an opt-in `@ray.remote(_spill_immediately=True)` / `.options(...)` task
option that spills a task's plasma-backed return objects to external storage
immediately after creation, bypassing the reactive object_spilling_threshold.

The flag rides on TaskSpec, is read in CoreWorker::SealReturnObject off the
current task spec, forwarded on the PinObjectIDs RPC, and the raylet's
HandlePinObjectIDs calls LocalObjectManager::SpillObjects right after pinning.
Gated by the enable_direct_spill config (default true). Spill-timing accounting
is moved into SpillObjectsInternal so direct spills are reflected in spill stats.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: You-Cheng Lin <mses010108@gmail.com>
@owenowenisme owenowenisme force-pushed the claude/sleepy-diffie-a9f794 branch from 3c17127 to 95b3b26 Compare June 27, 2026 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

1 participant