fix(disagg): NVFP4 + publish-only rollout fixes for disaggregated tra… by jvmncs · Pull Request #1 · modal-projects/miles

jvmncs · 2026-06-23T07:07:17Z

…ining

NOTE: Requires Megatron-LM patch here

Found bringing up disaggregated NVFP4 rollout (cookbook/miles_disagg, Moonlight/Kimi-K2.6) on Modal:

megatron_to_hf/processors: route quant_algo=="NVFP4" to quantize_params_nvfp4 (modelopt NVFP4 checkpoints advertise quant_method="modelopt", so dispatch on quant_algo); NVFP4 export was never reached before.
rollout/sglang_rollout: GenerateState.semaphore was Semaphore(0) in publish-only mode (rollout_num_gpus==0) -> every rollout deadlocked. Bound generation concurrency by sglang_server_concurrency when rollout_endpoint_url is set.
utils/http_utils: init_http_client early-returned when rollout_num_gpus==0, leaving _http_client=None -> "'NoneType' has no attribute 'post'". Initialize it and bound _client_concurrency by sglang_server_concurrency in publish-only.
update_weight/update_weight_from_disk_delta: flatten before .view(torch.uint8) (.contiguous().reshape(-1).view) so 0-dim NVFP4 scalar tensors (weight_scale_2, input_scale) don't crash the disk-delta encode/snapshot.
chat_template_utils/deepseek_v4: guard the encoding_dsv4 import so non-V4 models load on sglang-miles builds without that symbol.

…ining Found bringing up disaggregated NVFP4 rollout (cookbook/miles_disagg, Moonlight/Kimi-K2.6) on Modal: - megatron_to_hf/processors: route quant_algo=="NVFP4" to quantize_params_nvfp4 (modelopt NVFP4 checkpoints advertise quant_method="modelopt", so dispatch on quant_algo); NVFP4 export was never reached before. - rollout/sglang_rollout: GenerateState.semaphore was Semaphore(0) in publish-only mode (rollout_num_gpus==0) -> every rollout deadlocked. Bound generation concurrency by sglang_server_concurrency when rollout_endpoint_url is set. - utils/http_utils: init_http_client early-returned when rollout_num_gpus==0, leaving _http_client=None -> "'NoneType' has no attribute 'post'". Initialize it and bound _client_concurrency by sglang_server_concurrency in publish-only. - update_weight/update_weight_from_disk_delta: flatten before .view(torch.uint8) (.contiguous().reshape(-1).view) so 0-dim NVFP4 scalar tensors (weight_scale_2, input_scale) don't crash the disk-delta encode/snapshot. - chat_template_utils/deepseek_v4: guard the `encoding_dsv4` import so non-V4 models load on sglang-miles builds without that symbol.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(disagg): NVFP4 + publish-only rollout fixes for disaggregated tra…#1

fix(disagg): NVFP4 + publish-only rollout fixes for disaggregated tra…#1
jvmncs wants to merge 1 commit into
feat/disaggregated-rolloutfrom
nvfp4-disagg-fix2

jvmncs commented Jun 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

jvmncs commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jvmncs commented Jun 23, 2026 •

edited

Loading