[TENT] add priority and byte credits to runtime queue dispatch by zbchi · Pull Request #2655 · kvcache-ai/Mooncake

zbchi · 2026-06-28T12:13:02Z

Description

#2132
This PR extends the TENT runtime queue from FIFO dispatch to priority-aware dispatch with byte credits, and tightens the staging path when queue pressure builds up.

Queued owners are picked by Request.priority. User work and staging-internal work keep separate lanes inside each priority, so internal staging traffic can be accounted for separately without overriding the original request priority. Medium and low priority owners can also age into high priority through config.

Large queued owners now need enough byte credit before dispatch. This keeps dispatch charged by transfer size, not only owner count, while preserving FIFO order within each lane.

The staging path now reports pressure back to the runtime queue. ProxyManager::submit() has a bounded shard queue and returns TooManyRequests when the shard is full. Runtime-queued staged owners are requeued on temporary proxy pressure; non- retryable proxy submit failures are completed as failures. Stage-buffer pin errors are returned as status errors instead of aborting the process.

The latest changes also reduce runtime queue overhead by keeping public task lookup state per batch and draining progress-worker notifications in batches, avoiding extra map work and repeated progress passes.

Main changes:

dispatch queued owners by PRIO_HIGH, PRIO_MEDIUM, and PRIO_LOW
add byte credits to dispatch selection
keep separate user and staging-internal lanes within each priority
add configurable aging for queued owners
preserve staging-internal admission reserves without treating owner kind as priority
make internal staging requests inherit Request.priority
bound ProxyManager shard queues with staging/max_queued_tasks_per_shard
requeue runtime-queued staged owners on proxy TooManyRequests
reduce queue overhead in public task lookup and progress-worker wakeups

The runtime queue remains disabled by default, and public transfer APIs are unchanged.

Module

Transfer Engine (mooncake-transfer-engine)

How Has This Been Tested?

cmake --build build --target admission_queue_test tent_runtime_queue_dispatch_test tent_progress_worker_test tebench

./build/mooncake-transfer-engine/tent/tests/admission_queue_test
./build/mooncake-transfer-engine/tent/tests/tent_runtime_queue_dispatch_test
./build/mooncake-transfer-engine/tent/tests/tent_progress_worker_test

I also ran two Aliyun eRDMA benchmark checks with TENT tebench.(16 vCPU / 64GB RAM)

First, I compared direct submit with the runtime queue path on a steady 4KB workload:
Results, 4KB request size, batch size 64, 1 thread, 5s per run:

Mode	Run 1 GB/s	Run 2 GB/s	Run 3 GB/s	Avg GB/s	Avg Lat
direct	3.239529	3.239593	3.239587	3.239570	80.9 us
runtime queue	3.239583	3.239519	3.239504	3.239535	80.9 us

In this steady small-request run, the runtime queue path was effectively even with direct submit for throughput and average latency.

I also ran a queue-specific burst benchmark from tent-runtime-queue-bench against both the direct path and the runtime queue path. This uses the same workload for both modes: each burst submits low-priority work first, then high-priority work, and only polls completions after the burst is submitted.

For the runtime queue run, the dispatch window was capped at 64 owners. That is large enough to keep the transport busy, but still leaves backlog for the queue to schedule.

Runtime queue settings for this run:

{
  "enable_runtime_queue": true,
  "enable_progress_worker": true,
  "runtime_queue": {
    "max_dispatch_owners": 64,
    "max_dispatch_bytes": 1073741824
  }
}

Results, 4KB request size, batch size 16, burst depth 512, 4 threads, 5s per run, averaged over 5 runs:
Throughput Lat is the latency derived from total throughput. The Batch Tx columns measure each burst batch from submit time to observed completion, so they include backlog and polling delay.

Mode	Avg BW GB/s	Throughput Lat	Batch Avg Tx	Batch P99 Tx	Batch P999 Tx	High Avg Tx	High P99 Tx	Low Avg Tx	Low P99 Tx
direct	3.238890	80.9 us	34148.5 us	37063.2 us	38758.8 us	32272.0 us	36881.5 us	34416.6 us	37075.8 us
runtime queue	3.042866	86.3 us	20126.3 us	29177.5 us	31852.7 us	13268.5 us	24400.4 us	21106.0 us	29377.8 us

In this burst-backlog workload, the runtime queue traded 6% throughput for lower observed batch completion latency and better high-priority latency. This is expected because the queue keeps the transport in-flight window bounded instead of flooding the whole burst into the transport at once. High-priority completion latency dropped from 32.3 ms to 13.3 ms on average, and high-priority P99 dropped from 36.9 ms to 24.4 ms.

Checklist

I have performed a self-review of my own code
I have formatted my code using ./scripts/code_format.sh
I have run pre-commit run --all-files and all hooks pass
I have updated the documentation (if applicable)
I have added tests to prove my changes are effective
For changes >500 LOC: I have filed an RFC issue

AI Assistance Disclosure

No AI tools were used
AI tools were used (specify below)
Claude Code was used to assist with implementation and review. The final changes were reviewed and validated by myself.

gemini-code-assist

Code Review

This pull request introduces a weighted byte-deficit scheduler with priority aging to the admission queue, adds bounded queue support to the proxy manager shards, and optimizes the progress worker loop. The review feedback highlights critical concurrency issues in ProxyManager where stage_buffers_ is accessed without synchronization, a logic bug in ProgressWorker that could stall the runtime queue, load-balancing limitations due to the use of thread_local in shard selection, and potential crashes from unhandled JSON parsing exceptions in the control plane.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

codecov-commenter · 2026-06-28T12:36:32Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

zbchi added 11 commits June 26, 2026 16:28

extract queue dispatch scheduler

0dfd12c

dispatch queue owners by priority

0cbca1b

separate staging dispatch lane

6a93f5e

age queued transfers during dispatch

beb9cfd

requeue dispatching queue owners

282c695

preserve staging request priority

b8bb317

return stage buffer pin errors

083839d

bound staging proxy queue

4c657ba

dispatch queue owners with byte credits

ce1e553

perf: reduce runtime queue overhead

ff28fd9

polish runtime queue dispatch

ecb50c8

zbchi requested review from 00fish0, alogfans, chestnut-Q, doujiang24, dtcccc and staryxchen as code owners June 28, 2026 12:13

github-actions Bot added run-ci Transfer Engine labels Jun 28, 2026

gemini-code-assist Bot reviewed Jun 28, 2026

View reviewed changes

alogfans reviewed Jun 29, 2026

View reviewed changes

Comment thread mooncake-transfer-engine/tent/src/runtime/control_plane.cpp Outdated

zbchi added 3 commits June 29, 2026 22:58

guard stage buffer map access

11f4544

guard pin stage response parsing

d8ed1f1

fix queue wake and proxy shard selection

862a60d

alogfans reviewed Jun 30, 2026

View reviewed changes

Comment thread mooncake-transfer-engine/tent/src/runtime/admission_queue.cpp Outdated

remove queued dispatch proxy fallback

cf83d3f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TENT] add priority and byte credits to runtime queue dispatch#2655

[TENT] add priority and byte credits to runtime queue dispatch#2655
zbchi wants to merge 15 commits into
kvcache-ai:mainfrom
zbchi:tent-queue-3

zbchi commented Jun 28, 2026 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 28, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

zbchi commented Jun 28, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Module

How Has This Been Tested?

Checklist

AI Assistance Disclosure

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jun 28, 2026

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zbchi commented Jun 28, 2026 •

edited by github-actions Bot

Loading