[Feature]: Lora support by princepride · Pull Request #228 · vllm-project/vime

princepride · 2026-06-10T03:47:27Z

Summary

#206
This PR adds end-to-end LoRA support for Megatron actor training with colocated vLLM rollout. When --lora-rank > 0, the training actor applies LoRA adapters via Megatron-Bridge, exports PEFT-format adapters after each weight update, and hot-loads them into vLLM through the runtime LoRA API. Rollout requests then use the adapter name instead of the base model path.

This enables parameter-efficient RL fine-tuning without full-weight sync between trainer and inference engines on every step.

Motivation

Full-weight synchronization between Megatron and colocated vLLM engines is expensive for large models. LoRA reduces the sync payload to small adapter weights and lets vLLM load adapters at runtime, which is a better fit for iterative RL training.

Key Changes

Training (Megatron)

New CLI args: --lora-rank, --lora-alpha, --lora-dropout, --lora-type (lora / canonical_lora), --target-modules, --exclude-modules, --lora-adapter-name
lora_utils.py: target module parsing/conversion (HF ↔ Megatron), PEFT config generation, adapter export (adapter_model.bin + adapter_config.json)
model_provider.py: apply LoRA directly after Bridge provider returns the actor model (instead of relying on pre-wrap hooks)
actor.py: wake up offloaded model before LoRA export in offload_train + colocate mode
arguments.py: auto-enable vllm_enable_lora, validate LoRA requires --megatron-to-hf-mode bridge and --colocate

Weight Sync (Trainer → vLLM)

update_weight_from_tensor.py: when LoRA is enabled, skip full-weight IPC sync; export adapter and call load_lora_adapter on all rollout engines
vllm_engine.py:
- set VLLM_ALLOW_RUNTIME_LORA_UPDATING=1 when LoRA is enabled
- add load_lora_adapter() with unload-then-load flow
- handle vLLM's non-JSON / empty 200 responses on LoRA endpoints

Rollout

vllm_rollout.py / vllm_streaming_rollout.py: use adapter name (default vime_lora) as the "model" field in rollout requests when LoRA is enabled

Cleanup & Reliability

train.py: wrap main loop in try/finally to dispose actor/rollout resources
actor_group.py / actor.py: add dispose() for explicit wandb cleanup in Ray actors
logging_utils.py: call wandb.teardown() after wandb.finish() to avoid BrokenPipeError noise on Ray actor exit

Tests

python3 train.py \
  --hf-checkpoint /root/models/Qwen3-8B \
  --ref-load /root/models/Qwen3-8B_torch_dist \
  --load /root/models/Qwen3-8B \
  --finetune \
  --no-load-optim \
  --no-load-rng \
  --save /root/vime_lora_runs/qwen3-8b-dapo-lora-smoke \
  --save-interval 1 \
  --megatron-to-hf-mode bridge \
  --actor-num-nodes 1 \
  --actor-num-gpus-per-node 1 \
  --colocate \
  --calculate-per-token-loss \
  --prompt-data /root/datasets/dapo-math-17k/dapo-math-17k.jsonl \
  --input-key prompt \
  --label-key label \
  --apply-chat-template \
  --rollout-shuffle \
  --rm-type deepscaler \
  --num-rollout 1 \
  --rollout-batch-size 1 \
  --n-samples-per-prompt 1 \
  --rollout-max-response-len 1024 \
  --rollout-temperature 0.8 \
  --global-batch-size 1 \
  --micro-batch-size 1 \
  --advantage-estimator grpo \
  --kl-loss-coef 0.00 \
  --kl-loss-type k1 \
  --kl-coef 0.00 \
  --entropy-coef 0.00 \
  --eps-clip 4e-4 \
  --optimizer adam \
  --lr 1e-6 \
  --lr-decay-style constant \
  --weight-decay 0.1 \
  --adam-beta1 0.9 \
  --adam-beta2 0.98 \
  --tensor-model-parallel-size 1 \
  --pipeline-model-parallel-size 1 \
  --context-parallel-size 1 \
  --expert-model-parallel-size 1 \
  --expert-tensor-parallel-size 1 \
  --use-dynamic-batch-size \
  --max-tokens-per-gpu 4096 \
  --rollout-num-gpus-per-engine 1 \
  --rollout-num-gpus 1 \
  --vllm-gpu-memory-utilization 0.45 \
  --vllm-max-cudagraph-capture-size 32 \
  --lora-rank 8 \
  --lora-alpha 16 \
  --lora-dropout 0.0 \
  --lora-type lora \
  --target-modules all-linear \
  --only-train-params-name-list lora_A lora_B linear_in linear_out \
  --lora-adapter-name vime_lora \
  --attention-dropout 0.0 \
  --hidden-dropout 0.0 \
  --accumulate-allreduce-grads-in-fp32 \
  --attention-softmax-in-fp32 \
  --attention-backend flash \
  --use-wandb \
  --wandb-host https://api.wandb.ai \
  --wandb-project vime-lora-smoke \
  --wandb-group qwen3-8b-dapo-single-h100 \
  --wandb-key "${WANDB_API_KEY}" \
  --disable-wandb-random-suffix

gemini-code-assist

Code Review

This pull request introduces LoRA support for Megatron actor training and vLLM adapter serving, including target module parsing, conversion between Hugging Face and Megatron module names, and runtime adapter loading in vLLM. It also wraps the main training loop in a try-finally block to ensure proper resource disposal. The review feedback highlights several issues: a hardcoded W&B API key in the shell script, potential file-not-found errors on non-zero ranks during multi-node colocated training when saving adapters only on rank 0, potential hangs at a distributed barrier if rank 0 fails during weight updates, and unconditionally marking W&B runs as successful even if the training process crashes.

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

read-the-docs-community · 2026-06-15T13:47:08Z

Documentation build overview

📚 vime | 🛠️ Build #33308330 | 📁 Comparing 2aa99b9 against latest (491665d)

🔍 Preview build

27 files changed · + 1 added · ± 26 modified

+ Added

_examples_synced/tau-bench/README.html

± Modified

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

main #264 added `from vllm.utils.system_utils import kill_process_tree` to vllm_engine.py but the no-vllm test stub registered vllm.utils as a plain module, so `pytest tests/utils` failed at collection with "'vllm.utils' is not a package". Make vllm.utils a package and stub system_utils.kill_process_tree. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: princepride <wangzhipeng628@gmail.com>

Signed-off-by: princepride <wangzhipeng628@gmail.com>

gemini-code-assist Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread train.sh Outdated

Comment thread vime/backends/megatron_utils/lora_utils.py Outdated

Comment thread vime/backends/megatron_utils/update_weight/update_weight_from_tensor.py Outdated

Comment thread vime/utils/logging_utils.py Outdated

princepride added 7 commits June 15, 2026 21:25

add lora support

142ec26

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

ecb824d

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

97b550c

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Delete train.sh

2d8ee93

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

fix some bug

be5aa21

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix some bug

ec4201d

Signed-off-by: princepride <wangzhipeng628@gmail.com>

fix pre-commit error

cdc0155

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride force-pushed the lora-support branch from 0dbe374 to cdc0155 Compare June 15, 2026 13:46

princepride and others added 3 commits June 18, 2026 16:21

Merge branch 'main' into lora-support

1705d9c

Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>

Merge branch 'main' into lora-support

f824866

Signed-off-by: princepride <wangzhipeng628@gmail.com>

princepride force-pushed the lora-support branch from a230513 to d9b580b Compare June 21, 2026 09:03

aoshen02 mentioned this pull request Jun 21, 2026

[RFC] VIME Roadmap #11

Open

14 tasks

fix some bug

2aa99b9

Signed-off-by: princepride <wangzhipeng628@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Lora support#228

[Feature]: Lora support#228
princepride wants to merge 11 commits into
mainfrom
lora-support

princepride commented Jun 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

read-the-docs-community Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

princepride commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Key Changes

Training (Megatron)

Weight Sync (Trainer → vLLM)

Rollout

Cleanup & Reliability

Tests

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

read-the-docs-community Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

princepride commented Jun 10, 2026 •

edited

Loading

read-the-docs-community Bot commented Jun 15, 2026 •

edited

Loading