[Feature]: Lora support#228
Open
princepride wants to merge 11 commits into
Open
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces LoRA support for Megatron actor training and vLLM adapter serving, including target module parsing, conversion between Hugging Face and Megatron module names, and runtime adapter loading in vLLM. It also wraps the main training loop in a try-finally block to ensure proper resource disposal. The review feedback highlights several issues: a hardcoded W&B API key in the shell script, potential file-not-found errors on non-zero ranks during multi-node colocated training when saving adapters only on rank 0, potential hangs at a distributed barrier if rank 0 fails during weight updates, and unconditionally marking W&B runs as successful even if the training process crashes.
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
0dbe374 to
cdc0155
Compare
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
main #264 added `from vllm.utils.system_utils import kill_process_tree` to vllm_engine.py but the no-vllm test stub registered vllm.utils as a plain module, so `pytest tests/utils` failed at collection with "'vllm.utils' is not a package". Make vllm.utils a package and stub system_utils.kill_process_tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>
a230513 to
d9b580b
Compare
Signed-off-by: princepride <wangzhipeng628@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
#206
This PR adds end-to-end LoRA support for Megatron actor training with colocated vLLM rollout. When
--lora-rank > 0, the training actor applies LoRA adapters via Megatron-Bridge, exports PEFT-format adapters after each weight update, and hot-loads them into vLLM through the runtime LoRA API. Rollout requests then use the adapter name instead of the base model path.This enables parameter-efficient RL fine-tuning without full-weight sync between trainer and inference engines on every step.
Motivation
Full-weight synchronization between Megatron and colocated vLLM engines is expensive for large models. LoRA reduces the sync payload to small adapter weights and lets vLLM load adapters at runtime, which is a better fit for iterative RL training.
Key Changes
Training (Megatron)
--lora-rank,--lora-alpha,--lora-dropout,--lora-type(lora/canonical_lora),--target-modules,--exclude-modules,--lora-adapter-namelora_utils.py: target module parsing/conversion (HF ↔ Megatron), PEFT config generation, adapter export (adapter_model.bin+adapter_config.json)model_provider.py: apply LoRA directly after Bridge provider returns the actor model (instead of relying on pre-wrap hooks)actor.py: wake up offloaded model before LoRA export inoffload_train + colocatemodearguments.py: auto-enablevllm_enable_lora, validate LoRA requires--megatron-to-hf-mode bridgeand--colocateWeight Sync (Trainer → vLLM)
update_weight_from_tensor.py: when LoRA is enabled, skip full-weight IPC sync; export adapter and callload_lora_adapteron all rollout enginesvllm_engine.py:VLLM_ALLOW_RUNTIME_LORA_UPDATING=1when LoRA is enabledload_lora_adapter()with unload-then-load flowRollout
vllm_rollout.py/vllm_streaming_rollout.py: use adapter name (defaultvime_lora) as the"model"field in rollout requests when LoRA is enabledCleanup & Reliability
train.py: wrap main loop intry/finallyto dispose actor/rollout resourcesactor_group.py/actor.py: adddispose()for explicit wandb cleanup in Ray actorslogging_utils.py: callwandb.teardown()afterwandb.finish()to avoidBrokenPipeErrornoise on Ray actor exitTests
python3 train.py \ --hf-checkpoint /root/models/Qwen3-8B \ --ref-load /root/models/Qwen3-8B_torch_dist \ --load /root/models/Qwen3-8B \ --finetune \ --no-load-optim \ --no-load-rng \ --save /root/vime_lora_runs/qwen3-8b-dapo-lora-smoke \ --save-interval 1 \ --megatron-to-hf-mode bridge \ --actor-num-nodes 1 \ --actor-num-gpus-per-node 1 \ --colocate \ --calculate-per-token-loss \ --prompt-data /root/datasets/dapo-math-17k/dapo-math-17k.jsonl \ --input-key prompt \ --label-key label \ --apply-chat-template \ --rollout-shuffle \ --rm-type deepscaler \ --num-rollout 1 \ --rollout-batch-size 1 \ --n-samples-per-prompt 1 \ --rollout-max-response-len 1024 \ --rollout-temperature 0.8 \ --global-batch-size 1 \ --micro-batch-size 1 \ --advantage-estimator grpo \ --kl-loss-coef 0.00 \ --kl-loss-type k1 \ --kl-coef 0.00 \ --entropy-coef 0.00 \ --eps-clip 4e-4 \ --optimizer adam \ --lr 1e-6 \ --lr-decay-style constant \ --weight-decay 0.1 \ --adam-beta1 0.9 \ --adam-beta2 0.98 \ --tensor-model-parallel-size 1 \ --pipeline-model-parallel-size 1 \ --context-parallel-size 1 \ --expert-model-parallel-size 1 \ --expert-tensor-parallel-size 1 \ --use-dynamic-batch-size \ --max-tokens-per-gpu 4096 \ --rollout-num-gpus-per-engine 1 \ --rollout-num-gpus 1 \ --vllm-gpu-memory-utilization 0.45 \ --vllm-max-cudagraph-capture-size 32 \ --lora-rank 8 \ --lora-alpha 16 \ --lora-dropout 0.0 \ --lora-type lora \ --target-modules all-linear \ --only-train-params-name-list lora_A lora_B linear_in linear_out \ --lora-adapter-name vime_lora \ --attention-dropout 0.0 \ --hidden-dropout 0.0 \ --accumulate-allreduce-grads-in-fp32 \ --attention-softmax-in-fp32 \ --attention-backend flash \ --use-wandb \ --wandb-host https://api.wandb.ai \ --wandb-project vime-lora-smoke \ --wandb-group qwen3-8b-dapo-single-h100 \ --wandb-key "${WANDB_API_KEY}" \ --disable-wandb-random-suffix