Skip to content

[Feature]: Lora support#228

Open
princepride wants to merge 11 commits into
mainfrom
lora-support
Open

[Feature]: Lora support#228
princepride wants to merge 11 commits into
mainfrom
lora-support

Conversation

@princepride

@princepride princepride commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

#206
This PR adds end-to-end LoRA support for Megatron actor training with colocated vLLM rollout. When --lora-rank > 0, the training actor applies LoRA adapters via Megatron-Bridge, exports PEFT-format adapters after each weight update, and hot-loads them into vLLM through the runtime LoRA API. Rollout requests then use the adapter name instead of the base model path.

This enables parameter-efficient RL fine-tuning without full-weight sync between trainer and inference engines on every step.

Motivation

Full-weight synchronization between Megatron and colocated vLLM engines is expensive for large models. LoRA reduces the sync payload to small adapter weights and lets vLLM load adapters at runtime, which is a better fit for iterative RL training.

Key Changes

Training (Megatron)

  • New CLI args: --lora-rank, --lora-alpha, --lora-dropout, --lora-type (lora / canonical_lora), --target-modules, --exclude-modules, --lora-adapter-name
  • lora_utils.py: target module parsing/conversion (HF ↔ Megatron), PEFT config generation, adapter export (adapter_model.bin + adapter_config.json)
  • model_provider.py: apply LoRA directly after Bridge provider returns the actor model (instead of relying on pre-wrap hooks)
  • actor.py: wake up offloaded model before LoRA export in offload_train + colocate mode
  • arguments.py: auto-enable vllm_enable_lora, validate LoRA requires --megatron-to-hf-mode bridge and --colocate

Weight Sync (Trainer → vLLM)

  • update_weight_from_tensor.py: when LoRA is enabled, skip full-weight IPC sync; export adapter and call load_lora_adapter on all rollout engines
  • vllm_engine.py:
    • set VLLM_ALLOW_RUNTIME_LORA_UPDATING=1 when LoRA is enabled
    • add load_lora_adapter() with unload-then-load flow
    • handle vLLM's non-JSON / empty 200 responses on LoRA endpoints

Rollout

  • vllm_rollout.py / vllm_streaming_rollout.py: use adapter name (default vime_lora) as the "model" field in rollout requests when LoRA is enabled

Cleanup & Reliability

  • train.py: wrap main loop in try/finally to dispose actor/rollout resources
  • actor_group.py / actor.py: add dispose() for explicit wandb cleanup in Ray actors
  • logging_utils.py: call wandb.teardown() after wandb.finish() to avoid BrokenPipeError noise on Ray actor exit

Tests

python3 train.py \
  --hf-checkpoint /root/models/Qwen3-8B \
  --ref-load /root/models/Qwen3-8B_torch_dist \
  --load /root/models/Qwen3-8B \
  --finetune \
  --no-load-optim \
  --no-load-rng \
  --save /root/vime_lora_runs/qwen3-8b-dapo-lora-smoke \
  --save-interval 1 \
  --megatron-to-hf-mode bridge \
  --actor-num-nodes 1 \
  --actor-num-gpus-per-node 1 \
  --colocate \
  --calculate-per-token-loss \
  --prompt-data /root/datasets/dapo-math-17k/dapo-math-17k.jsonl \
  --input-key prompt \
  --label-key label \
  --apply-chat-template \
  --rollout-shuffle \
  --rm-type deepscaler \
  --num-rollout 1 \
  --rollout-batch-size 1 \
  --n-samples-per-prompt 1 \
  --rollout-max-response-len 1024 \
  --rollout-temperature 0.8 \
  --global-batch-size 1 \
  --micro-batch-size 1 \
  --advantage-estimator grpo \
  --kl-loss-coef 0.00 \
  --kl-loss-type k1 \
  --kl-coef 0.00 \
  --entropy-coef 0.00 \
  --eps-clip 4e-4 \
  --optimizer adam \
  --lr 1e-6 \
  --lr-decay-style constant \
  --weight-decay 0.1 \
  --adam-beta1 0.9 \
  --adam-beta2 0.98 \
  --tensor-model-parallel-size 1 \
  --pipeline-model-parallel-size 1 \
  --context-parallel-size 1 \
  --expert-model-parallel-size 1 \
  --expert-tensor-parallel-size 1 \
  --use-dynamic-batch-size \
  --max-tokens-per-gpu 4096 \
  --rollout-num-gpus-per-engine 1 \
  --rollout-num-gpus 1 \
  --vllm-gpu-memory-utilization 0.45 \
  --vllm-max-cudagraph-capture-size 32 \
  --lora-rank 8 \
  --lora-alpha 16 \
  --lora-dropout 0.0 \
  --lora-type lora \
  --target-modules all-linear \
  --only-train-params-name-list lora_A lora_B linear_in linear_out \
  --lora-adapter-name vime_lora \
  --attention-dropout 0.0 \
  --hidden-dropout 0.0 \
  --accumulate-allreduce-grads-in-fp32 \
  --attention-softmax-in-fp32 \
  --attention-backend flash \
  --use-wandb \
  --wandb-host https://api.wandb.ai \
  --wandb-project vime-lora-smoke \
  --wandb-group qwen3-8b-dapo-single-h100 \
  --wandb-key "${WANDB_API_KEY}" \
  --disable-wandb-random-suffix

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces LoRA support for Megatron actor training and vLLM adapter serving, including target module parsing, conversion between Hugging Face and Megatron module names, and runtime adapter loading in vLLM. It also wraps the main training loop in a try-finally block to ensure proper resource disposal. The review feedback highlights several issues: a hardcoded W&B API key in the shell script, potential file-not-found errors on non-zero ranks during multi-node colocated training when saving adapters only on rank 0, potential hangs at a distributed barrier if rank 0 fails during weight updates, and unconditionally marking W&B runs as successful even if the training process crashes.

Comment thread train.sh Outdated
Comment thread vime/backends/megatron_utils/lora_utils.py Outdated
Comment thread vime/backends/megatron_utils/update_weight/update_weight_from_tensor.py Outdated
Comment thread vime/utils/logging_utils.py Outdated
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
@read-the-docs-community

read-the-docs-community Bot commented Jun 15, 2026

Copy link
Copy Markdown

princepride and others added 3 commits June 18, 2026 16:21
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
main #264 added `from vllm.utils.system_utils import kill_process_tree`
to vllm_engine.py but the no-vllm test stub registered vllm.utils as a
plain module, so `pytest tests/utils` failed at collection with
"'vllm.utils' is not a package". Make vllm.utils a package and stub
system_utils.kill_process_tree.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: princepride <wangzhipeng628@gmail.com>
@aoshen02 aoshen02 mentioned this pull request Jun 21, 2026
14 tasks
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant