Skip to content

fix(disagg): NVFP4 + publish-only rollout fixes for disaggregated tra…#1

Open
jvmncs wants to merge 1 commit into
feat/disaggregated-rolloutfrom
nvfp4-disagg-fix2
Open

fix(disagg): NVFP4 + publish-only rollout fixes for disaggregated tra…#1
jvmncs wants to merge 1 commit into
feat/disaggregated-rolloutfrom
nvfp4-disagg-fix2

Conversation

@jvmncs

@jvmncs jvmncs commented Jun 23, 2026

Copy link
Copy Markdown

…ining

NOTE: Requires Megatron-LM patch here

Found bringing up disaggregated NVFP4 rollout (cookbook/miles_disagg, Moonlight/Kimi-K2.6) on Modal:

  • megatron_to_hf/processors: route quant_algo=="NVFP4" to quantize_params_nvfp4 (modelopt NVFP4 checkpoints advertise quant_method="modelopt", so dispatch on quant_algo); NVFP4 export was never reached before.
  • rollout/sglang_rollout: GenerateState.semaphore was Semaphore(0) in publish-only mode (rollout_num_gpus==0) -> every rollout deadlocked. Bound generation concurrency by sglang_server_concurrency when rollout_endpoint_url is set.
  • utils/http_utils: init_http_client early-returned when rollout_num_gpus==0, leaving _http_client=None -> "'NoneType' has no attribute 'post'". Initialize it and bound _client_concurrency by sglang_server_concurrency in publish-only.
  • update_weight/update_weight_from_disk_delta: flatten before .view(torch.uint8) (.contiguous().reshape(-1).view) so 0-dim NVFP4 scalar tensors (weight_scale_2, input_scale) don't crash the disk-delta encode/snapshot.
  • chat_template_utils/deepseek_v4: guard the encoding_dsv4 import so non-V4 models load on sglang-miles builds without that symbol.

…ining

Found bringing up disaggregated NVFP4 rollout (cookbook/miles_disagg,
Moonlight/Kimi-K2.6) on Modal:

- megatron_to_hf/processors: route quant_algo=="NVFP4" to quantize_params_nvfp4
  (modelopt NVFP4 checkpoints advertise quant_method="modelopt", so dispatch on
  quant_algo); NVFP4 export was never reached before.
- rollout/sglang_rollout: GenerateState.semaphore was Semaphore(0) in publish-only
  mode (rollout_num_gpus==0) -> every rollout deadlocked. Bound generation
  concurrency by sglang_server_concurrency when rollout_endpoint_url is set.
- utils/http_utils: init_http_client early-returned when rollout_num_gpus==0,
  leaving _http_client=None -> "'NoneType' has no attribute 'post'". Initialize
  it and bound _client_concurrency by sglang_server_concurrency in publish-only.
- update_weight/update_weight_from_disk_delta: flatten before .view(torch.uint8)
  (.contiguous().reshape(-1).view) so 0-dim NVFP4 scalar tensors (weight_scale_2,
  input_scale) don't crash the disk-delta encode/snapshot.
- chat_template_utils/deepseek_v4: guard the `encoding_dsv4` import so non-V4
  models load on sglang-miles builds without that symbol.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant