Skip to content

feat(enflame): add GCU platform, engines, and runtime shims for verl 0.9#6

Open
gongxijun wants to merge 4 commits into
verl-project:mainfrom
gongxijun:main
Open

feat(enflame): add GCU platform, engines, and runtime shims for verl 0.9#6
gongxijun wants to merge 4 commits into
verl-project:mainfrom
gongxijun:main

Conversation

@gongxijun

Copy link
Copy Markdown

Add Enflame GCU support to verl-hardware-plugin for verl 0.9+ plugin architecture (PlatformRegistry + EngineRegistry).

Platform (VERL_PLATFORM=enflame):

  • Register PlatformENFLAME with device_name="gcu" and vendor_name="enflame" so upstream get_device_name()/c10d backends use torch.gcu while engine lookup uses (gcu, enflame).
  • ECCL/FlagCX communication backend, TOPS_VISIBLE_DEVICES, Ray GPU resource.
  • Apply torch.gcu runtime shims on first access: no-op ipc_collect (required by verl vLLM weight-transfer cleanup) and Stream.cuda_stream compat for FlagCX.

Engines:

  • FSDP/FSDP2 LM and value head engines (device="gcu", vendor="enflame").
  • Megatron LM head engine (device="gcu", vendor="enflame").

Also add registration wiring, unit tests, user guide, and README/development doc updates.

Summary

Motivation

Changes

Testing

  • [ √ ] pytest tests/ -v passes
  • [√ ] Manually verified on target hardware (if applicable)

Acceptance Baseline (for new hardware adaptation PRs)

  • Ran scripts/baseline_grpo_gsm8k.sh on target hardware (8 devices)
  • Training completed all epochs without error
  • critic/rewards/mean shows clear upward trend in first 100 steps
  • Curve is consistent with NVIDIA reference

SwanLab or training log link:

Reward curve comparison (first 100 steps):

Checklist

  • [√ ] Code follows the project's style and passes pre-commit checks
  • [√ ] Documentation updated (if applicable)
  • [ √] No secrets or credentials included

Add Enflame GCU support to verl-hardware-plugin for verl 0.9+ plugin
architecture (PlatformRegistry + EngineRegistry).

Platform (VERL_PLATFORM=enflame):
- Register PlatformENFLAME with device_name="gcu" and vendor_name="enflame"
  so upstream get_device_name()/c10d backends use torch.gcu while engine
  lookup uses (gcu, enflame).
- ECCL/FlagCX communication backend, TOPS_VISIBLE_DEVICES, Ray GPU resource.
- Apply torch.gcu runtime shims on first access: no-op ipc_collect (required
  by verl vLLM weight-transfer cleanup) and Stream.cuda_stream compat for
  FlagCX.

Engines:
- FSDP/FSDP2 LM and value head engines (device="gcu", vendor="enflame").
- Megatron LM head engine (device="gcu", vendor="enflame").

Also add registration wiring, unit tests, user guide, and README/development
doc updates.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the Enflame GCU platform and its corresponding FSDP and Megatron training engines, along with documentation and unit tests. The feedback highlights a few issues: an unresolved git merge conflict marker in the README, a potential runtime TypeError when patching the torch.gcu.Stream C-extension class, and the need to respect the use_smi_check flag in the platform availability check to prevent premature hardware initialization.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread README.md Outdated
Comment thread verl_hardware_plugin/platforms/platform_enflame.py Outdated
Comment thread verl_hardware_plugin/platforms/platform_enflame.py
@gongxijun

gongxijun commented Jun 23, 2026

Copy link
Copy Markdown
Author

我们的测试脚本:

#!/bin/bash
# ENFLAME GCU single-chip FL training example (verl-FL + Migration).
#
# Prerequisites:
#   - pip install migration  (ENFLAME GCU runtime patches)
#   - pip install verl-FL      (PlatformENFLAME builtin)
#   - torch_gcu, ECCL, FlagGems, TransformerEngine-FL, vllm-plugin-FL
#
# Startup order: Migration patches apply on import before verl initializes PlatformENFLAME.

set -x

# ============ Migration bootstrap (must run before verl) ============
export ENFLAME_ENABLE_AUTO_MIGRATION=1
export PYTHONPATH="/home/xijun.gong/icode/Megatron-LM-FL"
export PYTHONPATH="${PYTHONPATH:-}"
export RAY_DEDUP_LOGS=0
#python3 -c "import verl; from verl.plugin.platform import get_platform; p=get_platform(); assert p.device_name in ('enflame', 'gcu'), f'unexpected platform: {p.device_name}'; print('platform:', p.device_name)"

# ============ ENFLAME Platform ============
export VERL_PLATFORM="${VERL_PLATFORM:-enflame}"
export TOPS_VISIBLE_DEVICES="${TOPS_VISIBLE_DEVICES:-0,1,2,3}"
export RAY_EXPERIMENTAL_NOSET_TOPS_VISIBLE_DEVICES=1
export RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
export HYDRA_FULL_ERROR=1
export VERL_LOGGING_LEVEL=DEBUG
# ============ FL / Communication (single-chip ECCL homogenous) ============
# export VERL_ENGINE_DEVICE=flagos
# export TE_FL_PREFER=flagos
export TE_FL_STRICT=0
# export USE_FLAGGEMS=true
export USE_FLAGGEMS=false
export VLLM_FL_OOT_ENABLED=1
export TRAIN_FILES="/home/xijun.gong/icode/gsm8k/train.parquet"
export VAL_FILES="/home/xijun.gong/icode/gsm8k/test.parquet"
export MODEL_PATH="/home/xijun.gong/Qwen3-0.6B"
export TEFL_LOG_LEVEL=DEBUG
export TE_FL_SKIP_CUDA=1
export TE_FL_PREFER=vendor
export NVTE_DEBUG=1
export NVTE_DEBUG_LEVEL=2
export ENFLAME_MIGRATION_CACHE_DIR=./mycache
export ENFLAME_MIGRATION_DUMP_DIR=./migration_debug
export ENFLAME_MIGRATION_LOG_LEVEL=INFO
# Default: ECCL for ENFLAME homogenous cluster. Set USE_FLAGCX=1 for FlagCX instead.
export USE_FLAGCX="${USE_FLAGCX:-0}"

python -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    +ray_kwargs.ray_init.runtime_env.env_vars.ENFLAME_ENABLE_AUTO_MIGRATION=\'1\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.CUDA_DEVICE_MAX_CONNECTIONS=\'1\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.RAY_EXPERIMENTAL_NOSET_TOPS_VISIBLE_DEVICES=\'1\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.ENFLAME_TE_KERNEL_TRITON_BACKEND=\'fused_rope_fwd,fused_rope_bwd\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.TOPS_VISIBLE_DEVICES=\'0,1,2,3,4,5,6,7\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.VLLM_ALL2ALL_BACKEND=allgather_reducescatter \
    +ray_kwargs.ray_init.runtime_env.env_vars.VERL_LOGGING_LEVEL=DEBUG \
    +ray_kwargs.ray_init.runtime_env.env_vars.VERL_USE_EXTERNAL_MODULES='verl_hardware_plugin' \
    +ray_kwargs.ray_init.runtime_env.env_vars.VERL_PLATFORM=\'enflame\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.PYTHONPATH=\'/home/xijun.gong/icode/Megatron-LM-FL\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.TORCHGCU_INDUCTOR_ENABLE=\'0\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.TORCHDYNAMO_DISABLE=\'1\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.VLLM_ENABLE_V1_MULTIPROCESSING=\'0\' \
    +ray_kwargs.ray_init.runtime_env.env_vars.TORCH_ECCL_AVOID_RECORD_STREAMS=\'1\' \
    actor_rollout_ref.rollout.enforce_eager=True \
    data.train_files="${TRAIN_FILES:-./train.parquet}" \
    data.val_files="${VAL_FILES:-./test.parquet}" \
    data.train_batch_size=64 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    data.filter_overlong_prompts=True \
    data.truncation='error' \
    actor_rollout_ref.model.path="${MODEL_PATH:-/path/to/Qwen3-0.6B}" \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=64 \
    actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.actor.entropy_coeff=0 \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.4 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.use_kl_in_reward=False \
    trainer.critic_warmup=0 \
    trainer.logger='["console"]' \
    trainer.project_name='verl_grpo_enflame_fl' \
    trainer.experiment_name='qwen3_0.6b_enflame_fl' \
    trainer.n_gpus_per_node="${N_GPUS_PER_NODE:-4}" \
    trainer.nnodes="${NNODES:-1}" \
    trainer.save_freq=20 \
    trainer.test_freq=5 \
    trainer.ray_wait_register_center_timeout=60 \
    +actor_rollout_ref.rollout.enable_sleep_mode=False \
    actor_rollout_ref.rollout.free_cache_engine=False \
    trainer.total_epochs=15 \
    "$@"

@gongxijun

Copy link
Copy Markdown
Author

qwen3-0.6B训练运行日志:

+ export ENFLAME_ENABLE_AUTO_MIGRATION=1
+ ENFLAME_ENABLE_AUTO_MIGRATION=1
+ export PYTHONPATH=/home/xijun.gong/icode/Megatron-LM-FL
+ PYTHONPATH=/home/xijun.gong/icode/Megatron-LM-FL
+ export PYTHONPATH=/home/xijun.gong/icode/Megatron-LM-FL
+ PYTHONPATH=/home/xijun.gong/icode/Megatron-LM-FL
+ export RAY_DEDUP_LOGS=0
+ RAY_DEDUP_LOGS=0
+ export VERL_PLATFORM=enflame
+ VERL_PLATFORM=enflame
+ export TOPS_VISIBLE_DEVICES=0,1,2,3
+ TOPS_VISIBLE_DEVICES=0,1,2,3
+ export RAY_EXPERIMENTAL_NOSET_TOPS_VISIBLE_DEVICES=1
+ RAY_EXPERIMENTAL_NOSET_TOPS_VISIBLE_DEVICES=1
+ export RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
+ RAY_ACCEL_ENV_VAR_OVERRIDE_ON_ZERO=0
+ export HYDRA_FULL_ERROR=1
+ HYDRA_FULL_ERROR=1
+ export VERL_LOGGING_LEVEL=DEBUG
+ VERL_LOGGING_LEVEL=DEBUG
+ export TE_FL_STRICT=0
+ TE_FL_STRICT=0
+ export USE_FLAGGEMS=false
+ USE_FLAGGEMS=false
+ export VLLM_FL_OOT_ENABLED=1
+ VLLM_FL_OOT_ENABLED=1
+ export TRAIN_FILES=/home/xijun.gong/icode/gsm8k/train.parquet
+ TRAIN_FILES=/home/xijun.gong/icode/gsm8k/train.parquet
+ export VAL_FILES=/home/xijun.gong/icode/gsm8k/test.parquet
+ VAL_FILES=/home/xijun.gong/icode/gsm8k/test.parquet
+ export MODEL_PATH=/home/xijun.gong/Qwen3-0.6B
+ MODEL_PATH=/home/xijun.gong/Qwen3-0.6B
+ export TEFL_LOG_LEVEL=DEBUG
+ TEFL_LOG_LEVEL=DEBUG
+ export TE_FL_SKIP_CUDA=1
+ TE_FL_SKIP_CUDA=1
+ export TE_FL_PREFER=vendor
+ TE_FL_PREFER=vendor
+ export NVTE_DEBUG=1
+ NVTE_DEBUG=1
+ export NVTE_DEBUG_LEVEL=2
+ NVTE_DEBUG_LEVEL=2
+ export ENFLAME_MIGRATION_CACHE_DIR=./mycache
+ ENFLAME_MIGRATION_CACHE_DIR=./mycache
+ export ENFLAME_MIGRATION_DUMP_DIR=./migration_debug
+ ENFLAME_MIGRATION_DUMP_DIR=./migration_debug
+ export ENFLAME_MIGRATION_LOG_LEVEL=INFO
+ ENFLAME_MIGRATION_LOG_LEVEL=INFO
+ export USE_FLAGCX=0
+ USE_FLAGCX=0
+ python -m verl.trainer.main_ppo algorithm.adv_estimator=grpo '+ray_kwargs.ray_init.runtime_env.env_vars.ENFLAME_ENABLE_AUTO_MIGRATION='\''1'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.CUDA_DEVICE_MAX_CONNECTIONS='\''1'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.RAY_EXPERIMENTAL_NOSET_TOPS_VISIBLE_DEVICES='\''1'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.ENFLAME_TE_KERNEL_TRITON_BACKEND='\''fused_rope_fwd,fused_rope_bwd'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.TOPS_VISIBLE_DEVICES='\''0,1,2,3,4,5,6,7'\''' +ray_kwargs.ray_init.runtime_env.env_vars.VLLM_ALL2ALL_BACKEND=allgather_reducescatter +ray_kwargs.ray_init.runtime_env.env_vars.VERL_LOGGING_LEVEL=DEBUG +ray_kwargs.ray_init.runtime_env.env_vars.VERL_USE_EXTERNAL_MODULES=verl_hardware_plugin '+ray_kwargs.ray_init.runtime_env.env_vars.VERL_PLATFORM='\''enflame'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.PYTHONPATH='\''/home/xijun.gong/icode/Megatron-LM-FL'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.TORCHGCU_INDUCTOR_ENABLE='\''0'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.TORCHDYNAMO_DISABLE='\''1'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.VLLM_ENABLE_V1_MULTIPROCESSING='\''0'\''' '+ray_kwargs.ray_init.runtime_env.env_vars.TORCH_ECCL_AVOID_RECORD_STREAMS='\''1'\''' actor_rollout_ref.rollout.enforce_eager=True data.train_files=/home/xijun.gong/icode/gsm8k/train.parquet data.val_files=/home/xijun.gong/icode/gsm8k/test.parquet data.train_batch_size=64 data.max_prompt_length=512 data.max_response_length=1024 data.filter_overlong_prompts=True data.truncation=error actor_rollout_ref.model.path=/home/xijun.gong/Qwen3-0.6B actor_rollout_ref.actor.optim.lr=1e-6 actor_rollout_ref.model.use_remove_padding=True actor_rollout_ref.actor.ppo_mini_batch_size=64 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 actor_rollout_ref.actor.use_kl_loss=True actor_rollout_ref.actor.kl_loss_coef=0.001 actor_rollout_ref.actor.kl_loss_type=low_var_kl actor_rollout_ref.actor.entropy_coeff=0 actor_rollout_ref.model.enable_gradient_checkpointing=True actor_rollout_ref.actor.fsdp_config.param_offload=False actor_rollout_ref.actor.fsdp_config.optimizer_offload=False actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 actor_rollout_ref.rollout.tensor_model_parallel_size=1 actor_rollout_ref.rollout.name=vllm actor_rollout_ref.rollout.gpu_memory_utilization=0.4 actor_rollout_ref.rollout.n=5 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=4 actor_rollout_ref.ref.fsdp_config.param_offload=True algorithm.use_kl_in_reward=False trainer.critic_warmup=0 'trainer.logger=["console"]' trainer.project_name=verl_grpo_enflame_fl trainer.experiment_name=qwen3_0.6b_enflame_fl trainer.n_gpus_per_node=4 trainer.nnodes=1 trainer.save_freq=20 trainer.test_freq=5 trainer.ray_wait_register_center_timeout=60 +actor_rollout_ref.rollout.enable_sleep_mode=False actor_rollout_ref.rollout.free_cache_engine=False trainer.total_epochs=15
[W623 08:44:12.347460797 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
/usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
  import pkg_resources
/usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
    The backend in torch.distributed.init_process_group set to eccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
    The device parameters have been replaced with gcu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************
    
  warnings.warn(msg, ImportWarning)
INFO:2026-06-23 08:44:14,258:Registered platform: intel (xpu)
INFO:2026-06-23 08:44:14,336:Registered platform: cambricon (mlu)
INFO:2026-06-23 08:44:14,413:Registered platform: metax (cuda)
INFO:2026-06-23 08:44:14,485:Registered platform: enflame (gcu)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1782204255.538181  156726 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1782204255.538865  156726 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
I0000 00:00:1782204255.576585  156726 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1782204256.463520  156726 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
I0000 00:00:1782204256.463867  156726 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
/usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
  EPOCH = datetime.datetime.utcfromtimestamp(0)
/usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
  _epoch = datetime.utcfromtimestamp(0)
/usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
  warnings.warn(msg, RuntimeWarning)
INFO:2026-06-23 08:44:18,321:Platform override from VERL_PLATFORM: enflame
DEBUG:2026-06-23 08:44:18,321:verl platform initialised: gcu
DEBUG:2026-06-23 08:44:19,190:FlagOS FSDP engines not registered: Either mode or options can be specified, but both can't be specified at the same time.
DEBUG:2026-06-23 08:44:19,268:FlagOS Megatron engines not registered: 'torchtitan'
INFO:2026-06-23 08:44:19,350:Registered engines: fsdp_xpu
DEBUG:2026-06-23 08:44:19,438:XPU Megatron engines not registered: 'torchtitan'
INFO:2026-06-23 08:44:19,524:Registered engines: fsdp_mlu
DEBUG:2026-06-23 08:44:19,602:MLU Megatron engines not registered: 'torchtitan'
INFO:2026-06-23 08:44:19,679:Registered engines: fsdp_metax
DEBUG:2026-06-23 08:44:19,759:MetaX Megatron engines not registered: 'torchtitan'
INFO:2026-06-23 08:44:19,841:Registered engines: fsdp_enflame
ERROR:2026-06-23 08:44:19,924:Failed to register Enflame Megatron engines (required): 'torchtitan'
INFO:2026-06-23 08:44:21,714:Registered platform: intel (xpu)
INFO:2026-06-23 08:44:21,714:Registered platform: cambricon (mlu)
INFO:2026-06-23 08:44:21,714:Registered platform: metax (cuda)
INFO:2026-06-23 08:44:21,714:Registered platform: enflame (gcu)
+++++++++++++++++transformer_engine...........
[2026-06-23 08:44:22,151][verl.utils.device][WARNING] - Detect setting config.trainer.device to cuda for gcu, automatically set to `gcu` instead.
/usr/local/lib/python3.12/dist-packages/verl/trainer/main_ppo.py:165: UserWarning: Disabled critic as algorithm.adv_estimator != gae. If it is not intended, please set critic.enable=True
  
/usr/local/lib/python3.12/dist-packages/ray/util/state/util.py:55: DeprecationWarning: Ray state API is no longer experimental. Please import from `ray.util.state`. instead. Importing from `ray.experimental` will be deprecated in future releases. 
  warnings.warn(
[validate_config] All configuration checks passed successfully!
[2026-06-23 08:44:22,409][__main__][WARNING] - Legacy trainer `main_ppo_v0.py` is deprecated, and wil be removed in v0.9.0.Please set `trainer.use_v1=True` in config to use V1 trainer.
/usr/local/lib/python3.12/dist-packages/ray/_private/node.py:1136: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='w' encoding='UTF-8'>
  process_info = ray._private.services.start_gcs_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2026-06-23 08:45:23,801	ERROR services.py:1342 -- Failed to start the dashboard 
2026-06-23 08:45:23,801	ERROR services.py:1367 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#logging-directory-structure' to find where the log file is.
2026-06-23 08:45:23,801	ERROR services.py:1411 -- 
The last 20 lines of /tmp/ray/session_2026-06-23_08-44-22_414043_156726/logs/dashboard.log (it contains the error message from the dashboard): 
  File "/usr/local/lib/python3.12/dist-packages/ray/dashboard/head.py", line 149, in _configure_http_server
    await self.http_server.run(dashboard_head_modules, subprocess_module_handles)
  File "/usr/local/lib/python3.12/dist-packages/ray/dashboard/http_server_head.py", line 404, in run
    app.add_routes(routes=routes.bound_routes())
  File "/usr/local/lib/python3.12/dist-packages/aiohttp/web_app.py", line 389, in add_routes
    return self.router.add_routes(routes)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/aiohttp/web_urldispatcher.py", line 1287, in add_routes
    registered_routes.extend(route_def.register(self))
                             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/aiohttp/web_routedef.py", line 98, in register
    resource = router.add_static(self.prefix, self.path, **self.kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/aiohttp/web_urldispatcher.py", line 1211, in add_static
    resource = StaticResource(
               ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/aiohttp/web_urldispatcher.py", line 570, in __init__
    raise ValueError(f"'{directory}' does not exist") from error
ValueError: '/usr/local/lib/python3.12/dist-packages/ray/dashboard/client/build/static' does not exist
/usr/local/lib/python3.12/dist-packages/ray/_private/node.py:1097: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='w' encoding='UTF-8'>
  self._webui_url, process_info = ray._private.services.start_api_server(
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/usr/lib/python3.12/subprocess.py:1127: ResourceWarning: subprocess 157256 is still running
  _warn("subprocess %s is still running" % self.pid,
ResourceWarning: Enable tracemalloc to get the object allocation traceback
2026-06-23 08:45:24,019	INFO worker.py:2007 -- Started a local Ray instance.
�[36m(pid=168816)�[0m [W623 08:45:29.163603092 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=168816)�[0m   import pkg_resources
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=168816)�[0m     *************************************************************************************************************
�[36m(pid=168816)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=168816)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=168816)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=168816)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=168816)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=168816)�[0m     *************************************************************************************************************
�[36m(pid=168816)�[0m     
�[36m(pid=168816)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=168816)�[0m INFO:2026-06-23 08:45:31,001:Registered platform: intel (xpu)
�[36m(pid=168816)�[0m INFO:2026-06-23 08:45:31,028:Registered platform: cambricon (mlu)
�[36m(pid=168816)�[0m INFO:2026-06-23 08:45:31,054:Registered platform: metax (cuda)
�[36m(pid=168816)�[0m INFO:2026-06-23 08:45:31,081:Registered platform: enflame (gcu)
�[36m(pid=168816)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=168816)�[0m I0000 00:00:1782204332.215233  168816 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=168816)�[0m I0000 00:00:1782204332.215967  168816 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=168816)�[0m I0000 00:00:1782204332.258817  168816 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=168816)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=168816)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=168816)�[0m I0000 00:00:1782204333.297319  168816 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=168816)�[0m I0000 00:00:1782204333.297710  168816 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=168816)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=168816)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=168816)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=168816)�[0m INFO:2026-06-23 08:45:35,710:Platform override from VERL_PLATFORM: enflame
�[36m(pid=168816)�[0m DEBUG:2026-06-23 08:45:35,710:verl platform initialised: gcu
�[36m(pid=168816)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=168816)�[0m   warnings.warn(
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=168816)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=168816)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=168816)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/inference/contexts/__init__.py:9: DeprecationWarning: The following imports from `dynamic_context.py` will be removed in this file in `megatron-core` 0.14. The imports here result in a cyclic import issue that causes rotary embeddings to import from Apex rather than Transformer Engine.
�[36m(pid=168816)�[0m   warnings.warn(
�[36m(pid=168816)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/optimizer/clip_grads.py:32: UserWarning: Transformer Engine and Apex are not installed. Falling back to local implementations of multi_tensor_applier, multi_tensor_l2norm, and multi_tensor_scale
�[36m(pid=168816)�[0m   warnings.warn(
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/torch/_utils.py:916: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
�[36m(pid=168816)�[0m   return self.fget.__get__(instance, owner)()
�[36m(pid=168816)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/backends.py:34: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=168816)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=168816)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/gpt/gpt_layer_specs.py:71: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=168816)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,336:Registered engines: fsdp_flagos
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,419:Registered engines: megatron_flagos
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,497:Registered engines: fsdp_xpu
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,573:Registered engines: megatron_xpu
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,651:Registered engines: fsdp_mlu
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,730:Registered engines: megatron_mlu
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,807:Registered engines: fsdp_metax
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,885:Registered engines: megatron_metax
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:13,965:Registered engines: fsdp_enflame
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:14,047:Registered engines: megatron_enflame
�[36m(pid=168816)�[0m INFO:2026-06-23 08:46:14,047:verl-hardware-plugin loaded successfully
�[36m(pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/ray/util/state/util.py:55: DeprecationWarning: Ray state API is no longer experimental. Please import from `ray.util.state`. instead. Importing from `ray.experimental` will be deprecated in future releases. 
�[36m(pid=168816)�[0m   warnings.warn(
ray init kwargs: {'num_cpus': None, 'runtime_env': {'env_vars': {'TOKENIZERS_PARALLELISM': 'true', 'NCCL_DEBUG': 'WARN', 'VLLM_LOGGING_LEVEL': 'WARN', 'VLLM_ALLOW_RUNTIME_LORA_UPDATING': 'true', 'CUDA_DEVICE_MAX_CONNECTIONS': '1', 'VLLM_DISABLE_COMPILE_CACHE': '1', 'HCCL_HOST_SOCKET_PORT_RANGE': 'auto', 'HCCL_NPU_SOCKET_PORT_RANGE': 'auto', 'HSA_NO_SCRATCH_RECLAIM': '1', 'ENFLAME_ENABLE_AUTO_MIGRATION': '1', 'RAY_EXPERIMENTAL_NOSET_TOPS_VISIBLE_DEVICES': '1', 'ENFLAME_TE_KERNEL_TRITON_BACKEND': 'fused_rope_fwd,fused_rope_bwd', 'TOPS_VISIBLE_DEVICES': '0,1,2,3,4,5,6,7', 'VLLM_ALL2ALL_BACKEND': 'allgather_reducescatter', 'VERL_LOGGING_LEVEL': 'DEBUG', 'VERL_USE_EXTERNAL_MODULES': 'verl_hardware_plugin', 'VERL_PLATFORM': 'enflame', 'PYTHONPATH': '/home/xijun.gong/icode/Megatron-LM-FL', 'TORCHGCU_INDUCTOR_ENABLE': '0', 'TORCHDYNAMO_DISABLE': '1', 'VLLM_ENABLE_V1_MULTIPROCESSING': '0', 'TORCH_ECCL_AVOID_RECORD_STREAMS': '1', 'NCCL_CUMEM_ENABLE': '0'}, 'working_dir': None}}
�[36m(pid=168816)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=168816)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=168816)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=168816)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=168816)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(TaskRunner pid=168816)�[0m TaskRunner hostname: sse-jq-118-3, PID: 168816
�[36m(TaskRunner pid=168816)�[0m {'actor_rollout_ref': {'actor': {'_target_': 'verl.workers.config.FSDPActorConfig',
�[36m(TaskRunner pid=168816)�[0m                                  'calculate_entropy': False,
�[36m(TaskRunner pid=168816)�[0m                                  'calculate_sum_pi_squared': False,
�[36m(TaskRunner pid=168816)�[0m                                  'checkpoint': {'_target_': 'verl.trainer.config.CheckpointConfig',
�[36m(TaskRunner pid=168816)�[0m                                                 'async_save': False,
�[36m(TaskRunner pid=168816)�[0m                                                 'load_contents': ['model',
�[36m(TaskRunner pid=168816)�[0m                                                                   'optimizer',
�[36m(TaskRunner pid=168816)�[0m                                                                   'extra'],
�[36m(TaskRunner pid=168816)�[0m                                                 'save_contents': ['model',
�[36m(TaskRunner pid=168816)�[0m                                                                   'optimizer',
�[36m(TaskRunner pid=168816)�[0m                                                                   'extra'],
�[36m(TaskRunner pid=168816)�[0m                                                 'strict': True},
�[36m(TaskRunner pid=168816)�[0m                                  'clip_ratio': 0.2,
�[36m(TaskRunner pid=168816)�[0m                                  'clip_ratio_c': 3.0,
�[36m(TaskRunner pid=168816)�[0m                                  'clip_ratio_high': 0.2,
�[36m(TaskRunner pid=168816)�[0m                                  'clip_ratio_low': 0.2,
�[36m(TaskRunner pid=168816)�[0m                                  'data_loader_seed': 42,
�[36m(TaskRunner pid=168816)�[0m                                  'entropy_checkpointing': False,
�[36m(TaskRunner pid=168816)�[0m                                  'entropy_coeff': 0,
�[36m(TaskRunner pid=168816)�[0m                                  'entropy_from_logits_chunk_size': 2048,
�[36m(TaskRunner pid=168816)�[0m                                  'entropy_from_logits_with_chunking': False,
�[36m(TaskRunner pid=168816)�[0m                                  'freeze_vision_tower': False,
�[36m(TaskRunner pid=168816)�[0m                                  'fsdp_config': {'_target_': 'verl.workers.config.FSDPEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                                                  'dtype': 'bfloat16',
�[36m(TaskRunner pid=168816)�[0m                                                  'entropy_checkpointing': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'entropy_from_logits_chunk_size': 2048,
�[36m(TaskRunner pid=168816)�[0m                                                  'entropy_from_logits_with_chunking': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'forward_only': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'forward_prefetch': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'fsdp_size': -1,
�[36m(TaskRunner pid=168816)�[0m                                                  'full_determinism': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'model_dtype': 'fp32',
�[36m(TaskRunner pid=168816)�[0m                                                  'offload_policy': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'optimizer_offload': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'param_offload': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'qat': {'_target_': 'verl.workers.config.QATEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                                                          'activation_observer': 'static_minmax',
�[36m(TaskRunner pid=168816)�[0m                                                          'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                                          'group_size': 16,
�[36m(TaskRunner pid=168816)�[0m                                                          'ignore_patterns': ['lm_head',
�[36m(TaskRunner pid=168816)�[0m                                                                              'embed_tokens',
�[36m(TaskRunner pid=168816)�[0m                                                                              're:.*mlp.gate$'],
�[36m(TaskRunner pid=168816)�[0m                                                          'mode': 'w4a16',
�[36m(TaskRunner pid=168816)�[0m                                                          'quantization_config_path': None},
�[36m(TaskRunner pid=168816)�[0m                                                  'reshard_after_forward': True,
�[36m(TaskRunner pid=168816)�[0m                                                  'seed': 42,
�[36m(TaskRunner pid=168816)�[0m                                                  'strategy': 'fsdp',
�[36m(TaskRunner pid=168816)�[0m                                                  'ulysses_sequence_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                                  'use_orig_params': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'use_torch_compile': True,
�[36m(TaskRunner pid=168816)�[0m                                                  'wrap_policy': {'min_num_params': 0}},
�[36m(TaskRunner pid=168816)�[0m                                  'grad_clip': 1.0,
�[36m(TaskRunner pid=168816)�[0m                                  'kl_loss_coef': 0.001,
�[36m(TaskRunner pid=168816)�[0m                                  'kl_loss_type': 'low_var_kl',
�[36m(TaskRunner pid=168816)�[0m                                  'loss_agg_mode': 'token-mean',
�[36m(TaskRunner pid=168816)�[0m                                  'loss_scale_factor': None,
�[36m(TaskRunner pid=168816)�[0m                                  'optim': {'_target_': 'verl.workers.config.FSDPOptimizerConfig',
�[36m(TaskRunner pid=168816)�[0m                                            'betas': [0.9, 0.999],
�[36m(TaskRunner pid=168816)�[0m                                            'clip_grad': 1.0,
�[36m(TaskRunner pid=168816)�[0m                                            'lr': 1e-06,
�[36m(TaskRunner pid=168816)�[0m                                            'lr_scheduler_type': 'constant',
�[36m(TaskRunner pid=168816)�[0m                                            'lr_warmup_steps': -1,
�[36m(TaskRunner pid=168816)�[0m                                            'lr_warmup_steps_ratio': 0.0,
�[36m(TaskRunner pid=168816)�[0m                                            'min_lr_ratio': 0.0,
�[36m(TaskRunner pid=168816)�[0m                                            'num_cycles': 0.5,
�[36m(TaskRunner pid=168816)�[0m                                            'optimizer': 'AdamW',
�[36m(TaskRunner pid=168816)�[0m                                            'optimizer_impl': 'torch.optim',
�[36m(TaskRunner pid=168816)�[0m                                            'override_optimizer_config': None,
�[36m(TaskRunner pid=168816)�[0m                                            'total_training_steps': -1,
�[36m(TaskRunner pid=168816)�[0m                                            'warmup_style': None,
�[36m(TaskRunner pid=168816)�[0m                                            'weight_decay': 0.01,
�[36m(TaskRunner pid=168816)�[0m                                            'zero_indexed_step': True},
�[36m(TaskRunner pid=168816)�[0m                                  'policy_loss': {'_target_': 'verl.workers.config.PolicyLossConfig',
�[36m(TaskRunner pid=168816)�[0m                                                  'clip_cov_lb': 1.0,
�[36m(TaskRunner pid=168816)�[0m                                                  'clip_cov_ratio': 0.0002,
�[36m(TaskRunner pid=168816)�[0m                                                  'clip_cov_ub': 5.0,
�[36m(TaskRunner pid=168816)�[0m                                                  'kl_cov_ratio': 0.0002,
�[36m(TaskRunner pid=168816)�[0m                                                  'loss_mode': 'vanilla',
�[36m(TaskRunner pid=168816)�[0m                                                  'ppo_kl_coef': 0.1},
�[36m(TaskRunner pid=168816)�[0m                                  'ppo_epochs': 1,
�[36m(TaskRunner pid=168816)�[0m                                  'ppo_max_token_len_per_gpu': 16384,
�[36m(TaskRunner pid=168816)�[0m                                  'ppo_micro_batch_size': None,
�[36m(TaskRunner pid=168816)�[0m                                  'ppo_micro_batch_size_per_gpu': 4,
�[36m(TaskRunner pid=168816)�[0m                                  'ppo_mini_batch_size': 64,
�[36m(TaskRunner pid=168816)�[0m                                  'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig',
�[36m(TaskRunner pid=168816)�[0m                                               'all_ranks': False,
�[36m(TaskRunner pid=168816)�[0m                                               'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                               'ranks': [],
�[36m(TaskRunner pid=168816)�[0m                                               'save_path': 'outputs/profile',
�[36m(TaskRunner pid=168816)�[0m                                               'tool': None,
�[36m(TaskRunner pid=168816)�[0m                                               'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                       'analysis': True,
�[36m(TaskRunner pid=168816)�[0m                                                                       'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                                       'discrete': False,
�[36m(TaskRunner pid=168816)�[0m                                                                       'level': 'level0'},
�[36m(TaskRunner pid=168816)�[0m                                                               'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                        'discrete': False},
�[36m(TaskRunner pid=168816)�[0m                                                               'precision_debugger': {'_target_': 'verl.utils.profiler.config.PrecisionDebuggerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                                      'config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                      'stages': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                      'steps': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                      'strict': False},
�[36m(TaskRunner pid=168816)�[0m                                                               'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                         'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                                         'discrete': False},
�[36m(TaskRunner pid=168816)�[0m                                                               'torch_memory': {'_target_': 'verl.utils.profiler.config.TorchMemoryToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                                'stack_depth': 32,
�[36m(TaskRunner pid=168816)�[0m                                                                                'trace_alloc_max_entries': 100000}}},
�[36m(TaskRunner pid=168816)�[0m                                  'qat': {'activation_observer': 'static_minmax',
�[36m(TaskRunner pid=168816)�[0m                                          'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                          'group_size': 16,
�[36m(TaskRunner pid=168816)�[0m                                          'ignore_patterns': ['lm_head',
�[36m(TaskRunner pid=168816)�[0m                                                              'embed_tokens',
�[36m(TaskRunner pid=168816)�[0m                                                              're:.*mlp.gate$'],
�[36m(TaskRunner pid=168816)�[0m                                          'mode': 'w4a16',
�[36m(TaskRunner pid=168816)�[0m                                          'quantization_config_path': None},
�[36m(TaskRunner pid=168816)�[0m                                  'rollout_n': 5,
�[36m(TaskRunner pid=168816)�[0m                                  'router_replay': {'_target_': 'verl.workers.config.RouterReplayConfig',
�[36m(TaskRunner pid=168816)�[0m                                                    'mode': 'disabled',
�[36m(TaskRunner pid=168816)�[0m                                                    'record_file': None,
�[36m(TaskRunner pid=168816)�[0m                                                    'replay_file': None},
�[36m(TaskRunner pid=168816)�[0m                                  'shuffle': False,
�[36m(TaskRunner pid=168816)�[0m                                  'strategy': 'fsdp',
�[36m(TaskRunner pid=168816)�[0m                                  'tau_neg': 1.05,
�[36m(TaskRunner pid=168816)�[0m                                  'tau_pos': 1.0,
�[36m(TaskRunner pid=168816)�[0m                                  'ulysses_sequence_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                  'use_dynamic_bsz': False,
�[36m(TaskRunner pid=168816)�[0m                                  'use_fused_kernels': False,
�[36m(TaskRunner pid=168816)�[0m                                  'use_kl_loss': True,
�[36m(TaskRunner pid=168816)�[0m                                  'use_prefix_grouper': False,
�[36m(TaskRunner pid=168816)�[0m                                  'use_remove_padding': True,
�[36m(TaskRunner pid=168816)�[0m                                  'use_torch_compile': True},
�[36m(TaskRunner pid=168816)�[0m                        'hybrid_engine': True,
�[36m(TaskRunner pid=168816)�[0m                        'model': {'_target_': 'verl.workers.config.HFModelConfig',
�[36m(TaskRunner pid=168816)�[0m                                  'custom_chat_template': None,
�[36m(TaskRunner pid=168816)�[0m                                  'enable_activation_offload': False,
�[36m(TaskRunner pid=168816)�[0m                                  'enable_gradient_checkpointing': True,
�[36m(TaskRunner pid=168816)�[0m                                  'exclude_modules': None,
�[36m(TaskRunner pid=168816)�[0m                                  'external_lib': None,
�[36m(TaskRunner pid=168816)�[0m                                  'fused_kernel_options': {'impl_backend': 'torch'},
�[36m(TaskRunner pid=168816)�[0m                                  'hf_config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                  'lora': {'a2a_experimental': False,
�[36m(TaskRunner pid=168816)�[0m                                           'adapter_path': None,
�[36m(TaskRunner pid=168816)�[0m                                           'alpha': 32,
�[36m(TaskRunner pid=168816)�[0m                                           'dropout': 0.0,
�[36m(TaskRunner pid=168816)�[0m                                           'dropout_position': 'pre',
�[36m(TaskRunner pid=168816)�[0m                                           'dtype': None,
�[36m(TaskRunner pid=168816)�[0m                                           'exclude_modules': [],
�[36m(TaskRunner pid=168816)�[0m                                           'freeze_language_model': True,
�[36m(TaskRunner pid=168816)�[0m                                           'freeze_vision_model': True,
�[36m(TaskRunner pid=168816)�[0m                                           'freeze_vision_projection': True,
�[36m(TaskRunner pid=168816)�[0m                                           'lora_A_init_method': 'xavier',
�[36m(TaskRunner pid=168816)�[0m                                           'lora_B_init_method': 'zero',
�[36m(TaskRunner pid=168816)�[0m                                           'merge': False,
�[36m(TaskRunner pid=168816)�[0m                                           'rank': 0,
�[36m(TaskRunner pid=168816)�[0m                                           'target_modules': ['linear_qkv',
�[36m(TaskRunner pid=168816)�[0m                                                              'linear_proj',
�[36m(TaskRunner pid=168816)�[0m                                                              'linear_fc1',
�[36m(TaskRunner pid=168816)�[0m                                                              'linear_fc2'],
�[36m(TaskRunner pid=168816)�[0m                                           'type': 'lora'},
�[36m(TaskRunner pid=168816)�[0m                                  'lora_adapter_path': None,
�[36m(TaskRunner pid=168816)�[0m                                  'lora_alpha': 16,
�[36m(TaskRunner pid=168816)�[0m                                  'lora_rank': 0,
�[36m(TaskRunner pid=168816)�[0m                                  'mtp': {'_target_': 'verl.workers.config.MtpConfig',
�[36m(TaskRunner pid=168816)�[0m                                          'detach_encoder': False,
�[36m(TaskRunner pid=168816)�[0m                                          'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                          'enable_rollout': False,
�[36m(TaskRunner pid=168816)�[0m                                          'enable_train': False,
�[36m(TaskRunner pid=168816)�[0m                                          'method': 'mtp',
�[36m(TaskRunner pid=168816)�[0m                                          'mtp_loss_scaling_factor': 0.1,
�[36m(TaskRunner pid=168816)�[0m                                          'num_speculative_tokens': 1,
�[36m(TaskRunner pid=168816)�[0m                                          'speculative_algorithm': 'EAGLE',
�[36m(TaskRunner pid=168816)�[0m                                          'speculative_eagle_topk': 1,
�[36m(TaskRunner pid=168816)�[0m                                          'speculative_num_draft_tokens': 4,
�[36m(TaskRunner pid=168816)�[0m                                          'speculative_num_steps': 3},
�[36m(TaskRunner pid=168816)�[0m                                  'override_config': {},
�[36m(TaskRunner pid=168816)�[0m                                  'path': '/home/xijun.gong/Qwen3-0.6B',
�[36m(TaskRunner pid=168816)�[0m                                  'target_modules': 'all-linear',
�[36m(TaskRunner pid=168816)�[0m                                  'tiled_mlp': {'enabled': False,
�[36m(TaskRunner pid=168816)�[0m                                                'num_shards': 4},
�[36m(TaskRunner pid=168816)�[0m                                  'tokenizer_path': None,
�[36m(TaskRunner pid=168816)�[0m                                  'trust_remote_code': False,
�[36m(TaskRunner pid=168816)�[0m                                  'use_fused_kernels': False,
�[36m(TaskRunner pid=168816)�[0m                                  'use_liger': False,
�[36m(TaskRunner pid=168816)�[0m                                  'use_remove_padding': True,
�[36m(TaskRunner pid=168816)�[0m                                  'use_shm': False},
�[36m(TaskRunner pid=168816)�[0m                        'nccl_timeout': 600,
�[36m(TaskRunner pid=168816)�[0m                        'ref': {'_target_': 'verl.workers.config.FSDPActorConfig',
�[36m(TaskRunner pid=168816)�[0m                                'entropy_checkpointing': False,
�[36m(TaskRunner pid=168816)�[0m                                'entropy_from_logits_chunk_size': 2048,
�[36m(TaskRunner pid=168816)�[0m                                'entropy_from_logits_with_chunking': False,
�[36m(TaskRunner pid=168816)�[0m                                'fsdp_config': {'_target_': 'verl.workers.config.FSDPEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                                                'dtype': 'bfloat16',
�[36m(TaskRunner pid=168816)�[0m                                                'entropy_checkpointing': False,
�[36m(TaskRunner pid=168816)�[0m                                                'entropy_from_logits_chunk_size': 2048,
�[36m(TaskRunner pid=168816)�[0m                                                'entropy_from_logits_with_chunking': False,
�[36m(TaskRunner pid=168816)�[0m                                                'forward_only': True,
�[36m(TaskRunner pid=168816)�[0m                                                'forward_prefetch': False,
�[36m(TaskRunner pid=168816)�[0m                                                'fsdp_size': -1,
�[36m(TaskRunner pid=168816)�[0m                                                'full_determinism': False,
�[36m(TaskRunner pid=168816)�[0m                                                'model_dtype': 'fp32',
�[36m(TaskRunner pid=168816)�[0m                                                'offload_policy': False,
�[36m(TaskRunner pid=168816)�[0m                                                'optimizer_offload': False,
�[36m(TaskRunner pid=168816)�[0m                                                'param_offload': True,
�[36m(TaskRunner pid=168816)�[0m                                                'qat': {'_target_': 'verl.workers.config.QATEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                                                        'activation_observer': 'static_minmax',
�[36m(TaskRunner pid=168816)�[0m                                                        'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                                        'group_size': 16,
�[36m(TaskRunner pid=168816)�[0m                                                        'ignore_patterns': ['lm_head',
�[36m(TaskRunner pid=168816)�[0m                                                                            'embed_tokens',
�[36m(TaskRunner pid=168816)�[0m                                                                            're:.*mlp.gate$'],
�[36m(TaskRunner pid=168816)�[0m                                                        'mode': 'w4a16',
�[36m(TaskRunner pid=168816)�[0m                                                        'quantization_config_path': None},
�[36m(TaskRunner pid=168816)�[0m                                                'reshard_after_forward': True,
�[36m(TaskRunner pid=168816)�[0m                                                'seed': 42,
�[36m(TaskRunner pid=168816)�[0m                                                'strategy': 'fsdp',
�[36m(TaskRunner pid=168816)�[0m                                                'ulysses_sequence_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                                'use_orig_params': False,
�[36m(TaskRunner pid=168816)�[0m                                                'use_torch_compile': True,
�[36m(TaskRunner pid=168816)�[0m                                                'wrap_policy': {'min_num_params': 0}},
�[36m(TaskRunner pid=168816)�[0m                                'log_prob_max_token_len_per_gpu': 16384,
�[36m(TaskRunner pid=168816)�[0m                                'log_prob_micro_batch_size': None,
�[36m(TaskRunner pid=168816)�[0m                                'log_prob_micro_batch_size_per_gpu': 4,
�[36m(TaskRunner pid=168816)�[0m                                'log_prob_use_dynamic_bsz': False,
�[36m(TaskRunner pid=168816)�[0m                                'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig',
�[36m(TaskRunner pid=168816)�[0m                                             'all_ranks': False,
�[36m(TaskRunner pid=168816)�[0m                                             'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                             'ranks': [],
�[36m(TaskRunner pid=168816)�[0m                                             'save_path': 'outputs/profile',
�[36m(TaskRunner pid=168816)�[0m                                             'tool': None,
�[36m(TaskRunner pid=168816)�[0m                                             'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                     'analysis': True,
�[36m(TaskRunner pid=168816)�[0m                                                                     'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                                     'discrete': False,
�[36m(TaskRunner pid=168816)�[0m                                                                     'level': 'level0'},
�[36m(TaskRunner pid=168816)�[0m                                                             'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                      'discrete': False},
�[36m(TaskRunner pid=168816)�[0m                                                             'precision_debugger': {'_target_': 'verl.utils.profiler.config.PrecisionDebuggerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                                    'config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                    'stages': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                    'steps': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                    'strict': False},
�[36m(TaskRunner pid=168816)�[0m                                                             'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                       'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                                       'discrete': False},
�[36m(TaskRunner pid=168816)�[0m                                                             'torch_memory': {'_target_': 'verl.utils.profiler.config.TorchMemoryToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                              'stack_depth': 32,
�[36m(TaskRunner pid=168816)�[0m                                                                              'trace_alloc_max_entries': 100000}}},
�[36m(TaskRunner pid=168816)�[0m                                'rollout_n': 5,
�[36m(TaskRunner pid=168816)�[0m                                'router_replay': {'_target_': 'verl.workers.config.RouterReplayConfig',
�[36m(TaskRunner pid=168816)�[0m                                                  'mode': 'disabled',
�[36m(TaskRunner pid=168816)�[0m                                                  'record_file': None,
�[36m(TaskRunner pid=168816)�[0m                                                  'replay_file': None},
�[36m(TaskRunner pid=168816)�[0m                                'strategy': 'fsdp',
�[36m(TaskRunner pid=168816)�[0m                                'ulysses_sequence_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                'use_torch_compile': True},
�[36m(TaskRunner pid=168816)�[0m                        'rollout': {'_target_': 'verl.workers.config.RolloutConfig',
�[36m(TaskRunner pid=168816)�[0m                                    'agent': {'_target_': 'verl.workers.config.AgentLoopConfig',
�[36m(TaskRunner pid=168816)�[0m                                              'agent_loop_config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                              'custom_async_server': {'_target_': 'verl.workers.config.CustomAsyncServerConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                      'name': None,
�[36m(TaskRunner pid=168816)�[0m                                                                      'path': None},
�[36m(TaskRunner pid=168816)�[0m                                              'default_agent_loop': 'single_turn_agent',
�[36m(TaskRunner pid=168816)�[0m                                              'num_workers': 8},
�[36m(TaskRunner pid=168816)�[0m                                    'calculate_log_probs': True,
�[36m(TaskRunner pid=168816)�[0m                                    'checkpoint_engine': {'_target_': 'verl.workers.config.CheckpointEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                                                          'backend': 'naive',
�[36m(TaskRunner pid=168816)�[0m                                                          'custom_backend_module': None,
�[36m(TaskRunner pid=168816)�[0m                                                          'engine_kwargs': {},
�[36m(TaskRunner pid=168816)�[0m                                                          'update_weights_bucket_megabytes': 2048},
�[36m(TaskRunner pid=168816)�[0m                                    'cudagraph_capture_sizes': None,
�[36m(TaskRunner pid=168816)�[0m                                    'data_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                    'disable_log_stats': True,
�[36m(TaskRunner pid=168816)�[0m                                    'disaggregation': {'bootstrap_port': None,
�[36m(TaskRunner pid=168816)�[0m                                                       'decode_replicas': 1,
�[36m(TaskRunner pid=168816)�[0m                                                       'decode_tensor_model_parallel_size': None,
�[36m(TaskRunner pid=168816)�[0m                                                       'enabled': False,
�[36m(TaskRunner pid=168816)�[0m                                                       'ib_device': None,
�[36m(TaskRunner pid=168816)�[0m                                                       'prefill_replicas': 1,
�[36m(TaskRunner pid=168816)�[0m                                                       'transfer_backend': 'nixl'},
�[36m(TaskRunner pid=168816)�[0m                                    'do_sample': True,
�[36m(TaskRunner pid=168816)�[0m                                    'dtype': 'bfloat16',
�[36m(TaskRunner pid=168816)�[0m                                    'enable_chunked_prefill': True,
�[36m(TaskRunner pid=168816)�[0m                                    'enable_prefix_caching': True,
�[36m(TaskRunner pid=168816)�[0m                                    'enable_rollout_routing_replay': False,
�[36m(TaskRunner pid=168816)�[0m                                    'enable_sleep_mode': False,
�[36m(TaskRunner pid=168816)�[0m                                    'enforce_eager': True,
�[36m(TaskRunner pid=168816)�[0m                                    'engine_kwargs': {'sglang': {},
�[36m(TaskRunner pid=168816)�[0m                                                      'trtllm': {},
�[36m(TaskRunner pid=168816)�[0m                                                      'vllm': {}},
�[36m(TaskRunner pid=168816)�[0m                                    'expert_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                    'free_cache_engine': False,
�[36m(TaskRunner pid=168816)�[0m                                    'gpu_memory_utilization': 0.4,
�[36m(TaskRunner pid=168816)�[0m                                    'ignore_eos': False,
�[36m(TaskRunner pid=168816)�[0m                                    'layered_summon': False,
�[36m(TaskRunner pid=168816)�[0m                                    'load_format': 'dummy',
�[36m(TaskRunner pid=168816)�[0m                                    'log_prob_max_token_len_per_gpu': 16384,
�[36m(TaskRunner pid=168816)�[0m                                    'log_prob_micro_batch_size': None,
�[36m(TaskRunner pid=168816)�[0m                                    'log_prob_micro_batch_size_per_gpu': 4,
�[36m(TaskRunner pid=168816)�[0m                                    'log_prob_use_dynamic_bsz': False,
�[36m(TaskRunner pid=168816)�[0m                                    'logprobs_mode': 'processed_logprobs',
�[36m(TaskRunner pid=168816)�[0m                                    'max_model_len': None,
�[36m(TaskRunner pid=168816)�[0m                                    'max_num_batched_tokens': 8192,
�[36m(TaskRunner pid=168816)�[0m                                    'max_num_seqs': 1024,
�[36m(TaskRunner pid=168816)�[0m                                    'mode': 'async',
�[36m(TaskRunner pid=168816)�[0m                                    'mtp': {'_target_': 'verl.workers.config.MtpConfig',
�[36m(TaskRunner pid=168816)�[0m                                            'detach_encoder': False,
�[36m(TaskRunner pid=168816)�[0m                                            'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                            'enable_rollout': False,
�[36m(TaskRunner pid=168816)�[0m                                            'enable_train': False,
�[36m(TaskRunner pid=168816)�[0m                                            'method': 'mtp',
�[36m(TaskRunner pid=168816)�[0m                                            'mtp_loss_scaling_factor': 0.1,
�[36m(TaskRunner pid=168816)�[0m                                            'num_speculative_tokens': 1,
�[36m(TaskRunner pid=168816)�[0m                                            'speculative_algorithm': 'EAGLE',
�[36m(TaskRunner pid=168816)�[0m                                            'speculative_eagle_topk': 1,
�[36m(TaskRunner pid=168816)�[0m                                            'speculative_num_draft_tokens': 4,
�[36m(TaskRunner pid=168816)�[0m                                            'speculative_num_steps': 3},
�[36m(TaskRunner pid=168816)�[0m                                    'multi_stage_wake_up': False,
�[36m(TaskRunner pid=168816)�[0m                                    'multi_turn': {'_target_': 'verl.workers.config.MultiTurnConfig',
�[36m(TaskRunner pid=168816)�[0m                                                   'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                                   'format': 'hermes',
�[36m(TaskRunner pid=168816)�[0m                                                   'function_tool_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                   'max_assistant_turns': None,
�[36m(TaskRunner pid=168816)�[0m                                                   'max_parallel_calls': 1,
�[36m(TaskRunner pid=168816)�[0m                                                   'max_tool_response_length': 256,
�[36m(TaskRunner pid=168816)�[0m                                                   'max_user_turns': None,
�[36m(TaskRunner pid=168816)�[0m                                                   'num_repeat_rollouts': None,
�[36m(TaskRunner pid=168816)�[0m                                                   'tokenization_sanity_check_mode': 'strict',
�[36m(TaskRunner pid=168816)�[0m                                                   'tool_config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                   'tool_response_truncate_side': 'middle',
�[36m(TaskRunner pid=168816)�[0m                                                   'use_inference_chat_template': False},
�[36m(TaskRunner pid=168816)�[0m                                    'n': 5,
�[36m(TaskRunner pid=168816)�[0m                                    'n_gpus_per_node': 4,
�[36m(TaskRunner pid=168816)�[0m                                    'name': 'vllm',
�[36m(TaskRunner pid=168816)�[0m                                    'nnodes': 0,
�[36m(TaskRunner pid=168816)�[0m                                    'over_sample_rate': 0,
�[36m(TaskRunner pid=168816)�[0m                                    'pipeline_model_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                    'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig',
�[36m(TaskRunner pid=168816)�[0m                                                 'all_ranks': False,
�[36m(TaskRunner pid=168816)�[0m                                                 'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                                 'ranks': [],
�[36m(TaskRunner pid=168816)�[0m                                                 'save_path': 'outputs/profile',
�[36m(TaskRunner pid=168816)�[0m                                                 'tool': None,
�[36m(TaskRunner pid=168816)�[0m                                                 'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                         'analysis': True,
�[36m(TaskRunner pid=168816)�[0m                                                                         'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                                         'discrete': False,
�[36m(TaskRunner pid=168816)�[0m                                                                         'level': 'level0',
�[36m(TaskRunner pid=168816)�[0m                                                                         'profile_token_end': None,
�[36m(TaskRunner pid=168816)�[0m                                                                         'profile_token_start': None},
�[36m(TaskRunner pid=168816)�[0m                                                                 'precision_debugger': {'_target_': 'verl.utils.profiler.config.PrecisionDebuggerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                                        'config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                        'stages': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                        'steps': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                        'strict': False},
�[36m(TaskRunner pid=168816)�[0m                                                                 'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                           'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                                           'discrete': False,
�[36m(TaskRunner pid=168816)�[0m                                                                           'profile_token_end': None,
�[36m(TaskRunner pid=168816)�[0m                                                                           'profile_token_start': None}}},
�[36m(TaskRunner pid=168816)�[0m                                    'prometheus': {'_target_': 'verl.workers.config.PrometheusConfig',
�[36m(TaskRunner pid=168816)�[0m                                                   'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                                   'file': '/tmp/ray/session_latest/metrics/prometheus/prometheus.yml',
�[36m(TaskRunner pid=168816)�[0m                                                   'port': 9090,
�[36m(TaskRunner pid=168816)�[0m                                                   'served_model_name': '/home/xijun.gong/Qwen3-0.6B'},
�[36m(TaskRunner pid=168816)�[0m                                    'prompt_length': 512,
�[36m(TaskRunner pid=168816)�[0m                                    'qat': {'_target_': 'verl.workers.config.QATEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                                            'activation_observer': 'static_minmax',
�[36m(TaskRunner pid=168816)�[0m                                            'enable': False,
�[36m(TaskRunner pid=168816)�[0m                                            'group_size': 16,
�[36m(TaskRunner pid=168816)�[0m                                            'ignore_patterns': ['lm_head',
�[36m(TaskRunner pid=168816)�[0m                                                                'embed_tokens',
�[36m(TaskRunner pid=168816)�[0m                                                                're:.*mlp.gate$'],
�[36m(TaskRunner pid=168816)�[0m                                            'mode': 'w4a16',
�[36m(TaskRunner pid=168816)�[0m                                            'quantization_config_path': None},
�[36m(TaskRunner pid=168816)�[0m                                    'quantization': None,
�[36m(TaskRunner pid=168816)�[0m                                    'quantization_config_file': None,
�[36m(TaskRunner pid=168816)�[0m                                    'response_length': 1024,
�[36m(TaskRunner pid=168816)�[0m                                    'scheduling_policy': 'fcfs',
�[36m(TaskRunner pid=168816)�[0m                                    'skip_tokenizer_init': True,
�[36m(TaskRunner pid=168816)�[0m                                    'temperature': 1.0,
�[36m(TaskRunner pid=168816)�[0m                                    'tensor_model_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                    'top_k': -1,
�[36m(TaskRunner pid=168816)�[0m                                    'top_p': 1,
�[36m(TaskRunner pid=168816)�[0m                                    'trace': {'_target_': 'verl.workers.config.TraceConfig',
�[36m(TaskRunner pid=168816)�[0m                                              'backend': None,
�[36m(TaskRunner pid=168816)�[0m                                              'experiment_name': 'qwen3_0.6b_enflame_fl',
�[36m(TaskRunner pid=168816)�[0m                                              'max_samples_per_step_per_worker': None,
�[36m(TaskRunner pid=168816)�[0m                                              'project_name': 'verl_grpo_enflame_fl',
�[36m(TaskRunner pid=168816)�[0m                                              'token2text': False},
�[36m(TaskRunner pid=168816)�[0m                                    'val_kwargs': {'_target_': 'verl.workers.config.SamplingConfig',
�[36m(TaskRunner pid=168816)�[0m                                                   'do_sample': False,
�[36m(TaskRunner pid=168816)�[0m                                                   'n': 1,
�[36m(TaskRunner pid=168816)�[0m                                                   'temperature': 0,
�[36m(TaskRunner pid=168816)�[0m                                                   'top_k': -1,
�[36m(TaskRunner pid=168816)�[0m                                                   'top_p': 1.0}}},
�[36m(TaskRunner pid=168816)�[0m  'algorithm': {'_target_': 'verl.trainer.config.AlgoConfig',
�[36m(TaskRunner pid=168816)�[0m                'adv_estimator': 'grpo',
�[36m(TaskRunner pid=168816)�[0m                'gamma': 1.0,
�[36m(TaskRunner pid=168816)�[0m                'kl_ctrl': {'_target_': 'verl.trainer.config.KLControlConfig',
�[36m(TaskRunner pid=168816)�[0m                            'horizon': 10000,
�[36m(TaskRunner pid=168816)�[0m                            'kl_coef': 0.001,
�[36m(TaskRunner pid=168816)�[0m                            'target_kl': 0.1,
�[36m(TaskRunner pid=168816)�[0m                            'type': 'fixed'},
�[36m(TaskRunner pid=168816)�[0m                'kl_penalty': 'kl',
�[36m(TaskRunner pid=168816)�[0m                'lam': 1.0,
�[36m(TaskRunner pid=168816)�[0m                'norm_adv_by_std_in_grpo': True,
�[36m(TaskRunner pid=168816)�[0m                'pf_ppo': {'reweight_method': 'pow', 'weight_pow': 2.0},
�[36m(TaskRunner pid=168816)�[0m                'rollout_correction': {'bypass_mode': False,
�[36m(TaskRunner pid=168816)�[0m                                       'loss_type': 'ppo_clip',
�[36m(TaskRunner pid=168816)�[0m                                       'rollout_is': None,
�[36m(TaskRunner pid=168816)�[0m                                       'rollout_is_batch_normalize': False,
�[36m(TaskRunner pid=168816)�[0m                                       'rollout_is_threshold': 2.0,
�[36m(TaskRunner pid=168816)�[0m                                       'rollout_rs': None,
�[36m(TaskRunner pid=168816)�[0m                                       'rollout_rs_threshold': None},
�[36m(TaskRunner pid=168816)�[0m                'use_kl_in_reward': False,
�[36m(TaskRunner pid=168816)�[0m                'use_pf_ppo': False},
�[36m(TaskRunner pid=168816)�[0m  'critic': 
�[36m(TaskRunner pid=168816)�[0m {'_target_': 'verl.workers.config.FSDPCriticConfig',
�[36m(TaskRunner pid=168816)�[0m             'checkpoint': {'_target_': 'verl.trainer.config.CheckpointConfig',
�[36m(TaskRunner pid=168816)�[0m                            'async_save': False,
�[36m(TaskRunner pid=168816)�[0m                            'load_contents': ['model', 'optimizer', 'extra'],
�[36m(TaskRunner pid=168816)�[0m                            'save_contents': ['model', 'optimizer', 'extra']},
�[36m(TaskRunner pid=168816)�[0m             'cliprange_value': 0.5,
�[36m(TaskRunner pid=168816)�[0m             'data_loader_seed': 42,
�[36m(TaskRunner pid=168816)�[0m             'enable': None,
�[36m(TaskRunner pid=168816)�[0m             'forward_max_token_len_per_gpu': 32768,
�[36m(TaskRunner pid=168816)�[0m             'forward_micro_batch_size': None,
�[36m(TaskRunner pid=168816)�[0m             'forward_micro_batch_size_per_gpu': None,
�[36m(TaskRunner pid=168816)�[0m             'fsdp': {'_target_': 'verl.workers.config.FSDPEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                      'dtype': 'bfloat16',
�[36m(TaskRunner pid=168816)�[0m                      'entropy_checkpointing': False,
�[36m(TaskRunner pid=168816)�[0m                      'entropy_from_logits_chunk_size': 2048,
�[36m(TaskRunner pid=168816)�[0m                      'entropy_from_logits_with_chunking': False,
�[36m(TaskRunner pid=168816)�[0m                      'forward_only': False,
�[36m(TaskRunner pid=168816)�[0m                      'forward_prefetch': False,
�[36m(TaskRunner pid=168816)�[0m                      'fsdp_size': -1,
�[36m(TaskRunner pid=168816)�[0m                      'full_determinism': False,
�[36m(TaskRunner pid=168816)�[0m                      'model_dtype': 'fp32',
�[36m(TaskRunner pid=168816)�[0m                      'offload_policy': False,
�[36m(TaskRunner pid=168816)�[0m                      'optimizer_offload': False,
�[36m(TaskRunner pid=168816)�[0m                      'param_offload': False,
�[36m(TaskRunner pid=168816)�[0m                      'qat': {'_target_': 'verl.workers.config.QATEngineConfig',
�[36m(TaskRunner pid=168816)�[0m                              'activation_observer': 'static_minmax',
�[36m(TaskRunner pid=168816)�[0m                              'enable': False,
�[36m(TaskRunner pid=168816)�[0m                              'group_size': 16,
�[36m(TaskRunner pid=168816)�[0m                              'ignore_patterns': ['lm_head',
�[36m(TaskRunner pid=168816)�[0m                                                  'embed_tokens',
�[36m(TaskRunner pid=168816)�[0m                                                  're:.*mlp.gate$'],
�[36m(TaskRunner pid=168816)�[0m                              'mode': 'w4a16',
�[36m(TaskRunner pid=168816)�[0m                              'quantization_config_path': None},
�[36m(TaskRunner pid=168816)�[0m                      'reshard_after_forward': True,
�[36m(TaskRunner pid=168816)�[0m                      'seed': 42,
�[36m(TaskRunner pid=168816)�[0m                      'strategy': 'fsdp',
�[36m(TaskRunner pid=168816)�[0m                      'ulysses_sequence_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                      'use_orig_params': False,
�[36m(TaskRunner pid=168816)�[0m                      'use_torch_compile': True,
�[36m(TaskRunner pid=168816)�[0m                      'wrap_policy': {'min_num_params': 0}},
�[36m(TaskRunner pid=168816)�[0m             'grad_clip': 1.0,
�[36m(TaskRunner pid=168816)�[0m             'loss_agg_mode': 'token-mean',
�[36m(TaskRunner pid=168816)�[0m             'model': {'_target_': 'verl.workers.config.HFModelConfig',
�[36m(TaskRunner pid=168816)�[0m                       'custom_chat_template': None,
�[36m(TaskRunner pid=168816)�[0m                       'enable_activation_offload': False,
�[36m(TaskRunner pid=168816)�[0m                       'enable_gradient_checkpointing': True,
�[36m(TaskRunner pid=168816)�[0m                       'exclude_modules': None,
�[36m(TaskRunner pid=168816)�[0m                       'external_lib': None,
�[36m(TaskRunner pid=168816)�[0m                       'fused_kernel_options': {'impl_backend': 'torch'},
�[36m(TaskRunner pid=168816)�[0m                       'hf_config_path': None,
�[36m(TaskRunner pid=168816)�[0m                       'lora': {'a2a_experimental': False,
�[36m(TaskRunner pid=168816)�[0m                                'adapter_path': None,
�[36m(TaskRunner pid=168816)�[0m                                'alpha': 32,
�[36m(TaskRunner pid=168816)�[0m                                'dropout': 0.0,
�[36m(TaskRunner pid=168816)�[0m                                'dropout_position': 'pre',
�[36m(TaskRunner pid=168816)�[0m                                'dtype': None,
�[36m(TaskRunner pid=168816)�[0m                                'exclude_modules': [],
�[36m(TaskRunner pid=168816)�[0m                                'freeze_language_model': True,
�[36m(TaskRunner pid=168816)�[0m                                'freeze_vision_model': True,
�[36m(TaskRunner pid=168816)�[0m                                'freeze_vision_projection': True,
�[36m(TaskRunner pid=168816)�[0m                                'lora_A_init_method': 'xavier',
�[36m(TaskRunner pid=168816)�[0m                                'lora_B_init_method': 'zero',
�[36m(TaskRunner pid=168816)�[0m                                'merge': False,
�[36m(TaskRunner pid=168816)�[0m                                'rank': 0,
�[36m(TaskRunner pid=168816)�[0m                                'target_modules': ['linear_qkv',
�[36m(TaskRunner pid=168816)�[0m                                                   'linear_proj',
�[36m(TaskRunner pid=168816)�[0m                                                   'linear_fc1',
�[36m(TaskRunner pid=168816)�[0m                                                   'linear_fc2'],
�[36m(TaskRunner pid=168816)�[0m                                'type': 'lora'},
�[36m(TaskRunner pid=168816)�[0m                       'lora_adapter_path': None,
�[36m(TaskRunner pid=168816)�[0m                       'lora_alpha': 16,
�[36m(TaskRunner pid=168816)�[0m                       'lora_rank': 0,
�[36m(TaskRunner pid=168816)�[0m                       'mtp': {'_target_': 'verl.workers.config.MtpConfig',
�[36m(TaskRunner pid=168816)�[0m                               'detach_encoder': False,
�[36m(TaskRunner pid=168816)�[0m                               'enable': False,
�[36m(TaskRunner pid=168816)�[0m                               'enable_rollout': False,
�[36m(TaskRunner pid=168816)�[0m                               'enable_train': False,
�[36m(TaskRunner pid=168816)�[0m                               'method': 'mtp',
�[36m(TaskRunner pid=168816)�[0m                               'mtp_loss_scaling_factor': 0.1,
�[36m(TaskRunner pid=168816)�[0m                               'num_speculative_tokens': 1,
�[36m(TaskRunner pid=168816)�[0m                               'speculative_algorithm': 'EAGLE',
�[36m(TaskRunner pid=168816)�[0m                               'speculative_eagle_topk': 1,
�[36m(TaskRunner pid=168816)�[0m                               'speculative_num_draft_tokens': 4,
�[36m(TaskRunner pid=168816)�[0m                               'speculative_num_steps': 3},
�[36m(TaskRunner pid=168816)�[0m                       'override_config': {},
�[36m(TaskRunner pid=168816)�[0m                       'path': '~/models/deepseek-llm-7b-chat',
�[36m(TaskRunner pid=168816)�[0m                       'target_modules': 'all-linear',
�[36m(TaskRunner pid=168816)�[0m                       'tiled_mlp': {'enabled': False, 'num_shards': 4},
�[36m(TaskRunner pid=168816)�[0m                       'tokenizer_path': None,
�[36m(TaskRunner pid=168816)�[0m                       'trust_remote_code': False,
�[36m(TaskRunner pid=168816)�[0m                       'use_fused_kernels': False,
�[36m(TaskRunner pid=168816)�[0m                       'use_liger': False,
�[36m(TaskRunner pid=168816)�[0m                       'use_remove_padding': True,
�[36m(TaskRunner pid=168816)�[0m                       'use_shm': False},
�[36m(TaskRunner pid=168816)�[0m             'optim': {'_target_': 'verl.workers.config.FSDPOptimizerConfig',
�[36m(TaskRunner pid=168816)�[0m                       'betas': [0.9, 0.999],
�[36m(TaskRunner pid=168816)�[0m                       'clip_grad'
�[36m(TaskRunner pid=168816)�[0m : 1.0,
�[36m(TaskRunner pid=168816)�[0m                       'lr': 1e-05,
�[36m(TaskRunner pid=168816)�[0m                       'lr_scheduler_type': 'constant',
�[36m(TaskRunner pid=168816)�[0m                       'lr_warmup_steps': -1,
�[36m(TaskRunner pid=168816)�[0m                       'lr_warmup_steps_ratio': 0.0,
�[36m(TaskRunner pid=168816)�[0m                       'min_lr_ratio': 0.0,
�[36m(TaskRunner pid=168816)�[0m                       'num_cycles': 0.5,
�[36m(TaskRunner pid=168816)�[0m                       'optimizer': 'AdamW',
�[36m(TaskRunner pid=168816)�[0m                       'optimizer_impl': 'torch.optim',
�[36m(TaskRunner pid=168816)�[0m                       'override_optimizer_config': None,
�[36m(TaskRunner pid=168816)�[0m                       'total_training_steps': -1,
�[36m(TaskRunner pid=168816)�[0m                       'warmup_style': None,
�[36m(TaskRunner pid=168816)�[0m                       'weight_decay': 0.01,
�[36m(TaskRunner pid=168816)�[0m                       'zero_indexed_step': True},
�[36m(TaskRunner pid=168816)�[0m             'ppo_epochs': 1,
�[36m(TaskRunner pid=168816)�[0m             'ppo_max_token_len_per_gpu': 32768,
�[36m(TaskRunner pid=168816)�[0m             'ppo_micro_batch_size': None,
�[36m(TaskRunner pid=168816)�[0m             'ppo_micro_batch_size_per_gpu': None,
�[36m(TaskRunner pid=168816)�[0m             'ppo_mini_batch_size': 64,
�[36m(TaskRunner pid=168816)�[0m             'profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig',
�[36m(TaskRunner pid=168816)�[0m                          'all_ranks': False,
�[36m(TaskRunner pid=168816)�[0m                          'enable': False,
�[36m(TaskRunner pid=168816)�[0m                          'ranks': [],
�[36m(TaskRunner pid=168816)�[0m                          'save_path': 'outputs/profile',
�[36m(TaskRunner pid=168816)�[0m                          'tool': None,
�[36m(TaskRunner pid=168816)�[0m                          'tool_config': {'npu': {'_target_': 'verl.utils.profiler.config.NPUToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                  'analysis': True,
�[36m(TaskRunner pid=168816)�[0m                                                  'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                  'discrete': False,
�[36m(TaskRunner pid=168816)�[0m                                                  'level': 'level0'},
�[36m(TaskRunner pid=168816)�[0m                                          'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                   'discrete': False},
�[36m(TaskRunner pid=168816)�[0m                                          'precision_debugger': {'_target_': 'verl.utils.profiler.config.PrecisionDebuggerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                 'config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                                 'stages': None,
�[36m(TaskRunner pid=168816)�[0m                                                                 'steps': None,
�[36m(TaskRunner pid=168816)�[0m                                                                 'strict': False},
�[36m(TaskRunner pid=168816)�[0m                                          'torch': {'_target_': 'verl.utils.profiler.config.TorchProfilerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                    'contents': [],
�[36m(TaskRunner pid=168816)�[0m                                                    'discrete': False},
�[36m(TaskRunner pid=168816)�[0m                                          'torch_memory': {'_target_': 'verl.utils.profiler.config.TorchMemoryToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                           'stack_depth': 32,
�[36m(TaskRunner pid=168816)�[0m                                                           'trace_alloc_max_entries': 100000}}},
�[36m(TaskRunner pid=168816)�[0m             'rollout_n': 5,
�[36m(TaskRunner pid=168816)�[0m             'shuffle': False,
�[36m(TaskRunner pid=168816)�[0m             'strategy': 'fsdp',
�[36m(TaskRunner pid=168816)�[0m             'ulysses_sequence_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m             'use_dynamic_bsz': False},
�[36m(TaskRunner pid=168816)�[0m  'custom_reward_function': {'name': None, 'path': None},
�[36m(TaskRunner pid=168816)�[0m  'data': {'apply_chat_template_kwargs': {},
�[36m(TaskRunner pid=168816)�[0m           'audio_key': 'audios',
�[36m(TaskRunner pid=168816)�[0m           'custom_cls': {'name': None, 'path': None},
�[36m(TaskRunner pid=168816)�[0m           'dataloader_num_workers': 8,
�[36m(TaskRunner pid=168816)�[0m           'filter_overlong_prompts': True,
�[36m(TaskRunner pid=168816)�[0m           'filter_overlong_prompts_workers': 1,
�[36m(TaskRunner pid=168816)�[0m           'function_tool_path': None,
�[36m(TaskRunner pid=168816)�[0m           'image_key': 'images',
�[36m(TaskRunner pid=168816)�[0m           'image_patch_size': 14,
�[36m(TaskRunner pid=168816)�[0m           'max_prompt_length': 512,
�[36m(TaskRunner pid=168816)�[0m           'max_response_length': 1024,
�[36m(TaskRunner pid=168816)�[0m           'mm_processor_kwargs': {},
�[36m(TaskRunner pid=168816)�[0m           'prompt_key': 'prompt',
�[36m(TaskRunner pid=168816)�[0m           'return_full_prompt': False,
�[36m(TaskRunner pid=168816)�[0m           'return_multi_modal_inputs': True,
�[36m(TaskRunner pid=168816)�[0m           'return_raw_chat': True,
�[36m(TaskRunner pid=168816)�[0m           'return_raw_input_ids': False,
�[36m(TaskRunner pid=168816)�[0m           'reward_fn_key': 'data_source',
�[36m(TaskRunner pid=168816)�[0m           'seed': None,
�[36m(TaskRunner pid=168816)�[0m           'shuffle': True,
�[36m(TaskRunner pid=168816)�[0m           'tokenizer': None,
�[36m(TaskRunner pid=168816)�[0m           'tool_config_path': None,
�[36m(TaskRunner pid=168816)�[0m           'train_batch_size': 64,
�[36m(TaskRunner pid=168816)�[0m           'train_files': '/home/xijun.gong/icode/gsm8k/train.parquet',
�[36m(TaskRunner pid=168816)�[0m           'train_max_samples': -1,
�[36m(TaskRunner pid=168816)�[0m           'truncation': 'error',
�[36m(TaskRunner pid=168816)�[0m           'trust_remote_code': False,
�[36m(TaskRunner pid=168816)�[0m           'use_shm': False,
�[36m(TaskRunner pid=168816)�[0m           'val_batch_size': None,
�[36m(TaskRunner pid=168816)�[0m           'val_files': '/home/xijun.gong/icode/gsm8k/test.parquet',
�[36m(TaskRunner pid=168816)�[0m           'val_max_samples': -1,
�[36m(TaskRunner pid=168816)�[0m           'validation_shuffle': False,
�[36m(TaskRunner pid=168816)�[0m           'video_key': 'videos'},
�[36m(TaskRunner pid=168816)�[0m  'distillation': {'_target_': 'verl.workers.config.DistillationConfig',
�[36m(TaskRunner pid=168816)�[0m                   'distillation_loss': {'_target_': 'verl.workers.config.DistillationLossConfig',
�[36m(TaskRunner pid=168816)�[0m                                         'clip_ratio': 0.2,
�[36m(TaskRunner pid=168816)�[0m                                         'clip_ratio_high': 0.2,
�[36m(TaskRunner pid=168816)�[0m                                         'clip_ratio_low': 0.2,
�[36m(TaskRunner pid=168816)�[0m                                         'distillation_loss_coef'
�[36m(TaskRunner pid=168816)�[0m : 
�[36m(TaskRunner pid=168816)�[0m 1.0,
�[36m(TaskRunner pid=168816)�[0m                                         'log_prob_min_clamp': None,
�[36m(TaskRunner pid=168816)�[0m                                         'loss_max_clamp': None,
�[36m(TaskRunner pid=168816)�[0m                                         'loss_mode': 'k3',
�[36m(TaskRunner pid=168816)�[0m                                         'policy_loss_mode': 'vanilla',
�[36m(TaskRunner pid=168816)�[0m                                         'topk': 32,
�[36m(TaskRunner pid=168816)�[0m                                         'use_policy_gradient': False,
�[36m(TaskRunner pid=168816)�[0m                                         'use_task_rewards': True},
�[36m(TaskRunner pid=168816)�[0m                   'enabled': False,
�[36m(TaskRunner pid=168816)�[0m                   'n_gpus_per_node': 8,
�[36m(TaskRunner pid=168816)�[0m                   'nnodes': 0,
�[36m(TaskRunner pid=168816)�[0m                   'teacher_key': 'data_source',
�[36m(TaskRunner pid=168816)�[0m                   'teacher_models': {'teacher_model': {'_target_': 'verl.workers.config.DistillationTeacherModelConfig',
�[36m(TaskRunner pid=168816)�[0m                                                        'inference': {'_target_': 'verl.workers.config.RolloutConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                      'cudagraph_capture_sizes': None,
�[36m(TaskRunner pid=168816)�[0m                                                                      'data_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                                                      'disable_log_stats': True,
�[36m(TaskRunner pid=168816)�[0m                                                                      'dtype': 'bfloat16',
�[36m(TaskRunner pid=168816)�[0m                                                                      'enable_chunked_prefill': True,
�[36m(TaskRunner pid=168816)�[0m                                                                      'enable_prefix_caching': True,
�[36m(TaskRunner pid=168816)�[0m                                                                      'enforce_eager': True,
�[36m(TaskRunner pid=168816)�[0m                                                                      'engine_kwargs': {},
�[36m(TaskRunner pid=168816)�[0m                                                                      'expert_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                                                      'free_cache_engine': True,
�[36m(TaskRunner pid=168816)�[0m                                                                      'gpu_memory_utilization': 0.5,
�[36m(TaskRunner pid=168816)�[0m                                                                      'limit_images': None,
�[36m(TaskRunner pid=168816)�[0m                                                                      'load_format': 'auto',
�[36m(TaskRunner pid=168816)�[0m                                                                      'max_model_len': None,
�[36m(TaskRunner pid=168816)�[0m                                                                      'max_num_batched_tokens': 8192,
�[36m(TaskRunner pid=168816)�[0m                                                                      'max_num_seqs': 1024,
�[36m(TaskRunner pid=168816)�[0m                                                                      'name': 'vllm',
�[36m(TaskRunner pid=168816)�[0m                                                                      'prompt_length': 512,
�[36m(TaskRunner pid=168816)�[0m                                                                      'response_length': 1024,
�[36m(TaskRunner pid=168816)�[0m                                                                      'skip_tokenizer_init': True,
�[36m(TaskRunner pid=168816)�[0m                                                                      'temperature': 1.0,
�[36m(TaskRunner pid=168816)�[0m                                                                      'tensor_model_parallel_size': 2},
�[36m(TaskRunner pid=168816)�[0m                                                        'key': None,
�[36m(TaskRunner pid=168816)�[0m                                                        'model_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                        'num_replicas': 0}}},
�[36m(TaskRunner pid=168816)�[0m  'global_profiler': {'_target_': 'verl.utils.profiler.ProfilerConfig',
�[36m(TaskRunner pid=168816)�[0m                      'global_tool_config': {'nsys': {'_target_': 'verl.utils.profiler.config.NsightToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                      'controller_nsight_options': {'cuda-graph-trace': 'graph',
�[36m(TaskRunner pid=168816)�[0m                                                                                    'cuda-memory-usage': 'true',
�[36m(TaskRunner pid=168816)�[0m                                                                                    'trace': 'cuda,nvtx,cublas,ucx'},
�[36m(TaskRunner pid=168816)�[0m                                                      'discrete': False,
�[36m(TaskRunner pid=168816)�[0m                                                      'worker_nsight_options': {'capture-range': 'cudaProfilerApi',
�[36m(TaskRunner pid=168816)�[0m                                                                                'capture-range-end': None,
�[36m(TaskRunner pid=168816)�[0m                                                                                'cuda-graph-trace': 'graph',
�[36m(TaskRunner pid=168816)�[0m                                                                                'cuda-memory-usage': 'true',
�[36m(TaskRunner pid=168816)�[0m                                                                                'kill': 'none',
�[36m(TaskRunner pid=168816)�[0m                                                                                'trace': 'cuda,nvtx,cublas,ucx'}},
�[36m(TaskRunner pid=168816)�[0m                                             'precision_debugger': {'_target_': 'verl.utils.profiler.config.PrecisionDebuggerToolConfig',
�[36m(TaskRunner pid=168816)�[0m                                                                    'config_path': None,
�[36m(TaskRunner pid=168816)�[0m                                                                    'stages': None,
�[36m(TaskRunner pid=168816)�[0m                                                                    'steps': None,
�[36m(TaskRunner pid=168816)�[0m                                                                    'strict': False},
�[36m(TaskRunner pid=168816)�[0m                                             'torch_memory': {'context': 'all',
�[36m(TaskRunner pid=168816)�[0m                                                              'kw_args': {},
�[36m(TaskRunner pid=168816)�[0m                                                              'stack_depth': 32,
�[36m(TaskRunner pid=168816)�[0m                                                              'stacks': 'all',
�[36m(TaskRunner pid=168816)�[0m                                                              'trace_alloc_max_entries': 100000}},
�[36m(TaskRunner pid=168816)�[0m                      'profile_continuous_steps': False,
�[36m(TaskRunner pid=168816)�[0m                      'save_path': 'outputs/profile',
�[36m(TaskRunner pid=168816)�[0m                      'steps': None,
�[36m(TaskRunner pid=168816)�[0m                      'tool': None},
�[36m(TaskRunner pid=168816)�[0m  'model_engine': 'dp',
�[36m(TaskRunner pid=168816)�[0m  'ray_kwargs': {'ray_init': {'num_cpus': None,
�[36m(TaskRunner pid=168816)�[0m                              'runtime_env': {'env_vars': {'CUDA_DEVICE_MAX_CONNECTIONS': '1',
�[36m(TaskRunner pid=168816)�[0m                                                           'ENFLAME_ENABLE_AUTO_MIGRATION': '1',
�[36m(TaskRunner pid=168816)�[0m                                                           'ENFLAME_TE_KERNEL_TRITON_BACKEND': 'fused_rope_fwd,fused_rope_bwd',
�[36m(TaskRunner pid=168816)�[0m                                                           'NCCL_CUMEM_ENABLE': '0',
�[36m(TaskRunner pid=168816)�[0m                                                           'PYTHONPATH': '/home/xijun.gong/icode/Megatron-LM-FL',
�[36m(TaskRunner pid=168816)�[0m                                                           'RAY_EXPERIMENTAL_NOSET_TOPS_VISIBLE_DEVICES': '1',
�[36m(TaskRunner pid=168816)�[0m                                                           'TOPS_VISIBLE_DEVICES': '0,1,2,3,4,5,6,7',
�[36m(TaskRunner pid=168816)�[0m                                                           'TORCHDYNAMO_DISABLE': '1',
�[36m(TaskRunner pid=168816)�[0m                                                           'TORCHGCU_INDUCTOR_ENABLE': '0',
�[36m(TaskRunner pid=168816)�[0m                                                           'TORCH_ECCL_AVOID_RECORD_STREAMS': '1',
�[36m(TaskRunner pid=168816)�[0m                                                           'VERL_LOGGING_LEVEL': 
�[36m(TaskRunner pid=168816)�[0m 'DEBUG',
�[36m(TaskRunner pid=168816)�[0m                                                           'VERL_PLATFORM': 'enflame',
�[36m(TaskRunner pid=168816)�[0m                                                           'VERL_USE_EXTERNAL_MODULES': 'verl_hardware_plugin',
�[36m(TaskRunner pid=168816)�[0m                                                           'VLLM_ALL2ALL_BACKEND': 'allgather_reducescatter',
�[36m(TaskRunner pid=168816)�[0m                                                           'VLLM_ENABLE_V1_MULTIPROCESSING': '0'}}},
�[36m(TaskRunner pid=168816)�[0m                 'timeline_json_file': None},
�[36m(TaskRunner pid=168816)�[0m  'reward': {'_target_': 'verl.workers.config.RewardConfig',
�[36m(TaskRunner pid=168816)�[0m             'custom_reward_function': {'name': 'compute_score', 'path': None},
�[36m(TaskRunner pid=168816)�[0m             'num_workers': 8,
�[36m(TaskRunner pid=168816)�[0m             'reward_manager': {'_target_': 'verl.workers.config.reward.RewardManagerConfig',
�[36m(TaskRunner pid=168816)�[0m                                'module': {'_target_': 'verl.trainer.config.config.ModuleConfig',
�[36m(TaskRunner pid=168816)�[0m                                           'name': 'custom_reward_manager',
�[36m(TaskRunner pid=168816)�[0m                                           'path': None},
�[36m(TaskRunner pid=168816)�[0m                                'name': 'naive',
�[36m(TaskRunner pid=168816)�[0m                                'source': 'register'},
�[36m(TaskRunner pid=168816)�[0m             'reward_model': {'_target_': 'verl.workers.config.RewardModelConfig',
�[36m(TaskRunner pid=168816)�[0m                              'enable': False,
�[36m(TaskRunner pid=168816)�[0m                              'enable_resource_pool': False,
�[36m(TaskRunner pid=168816)�[0m                              'model_path': None,
�[36m(TaskRunner pid=168816)�[0m                              'n_gpus_per_node': 8,
�[36m(TaskRunner pid=168816)�[0m                              'nnodes': 0,
�[36m(TaskRunner pid=168816)�[0m                              'rollout': {'_target_': 'verl.workers.config.RolloutConfig',
�[36m(TaskRunner pid=168816)�[0m                                          'cudagraph_capture_sizes': None,
�[36m(TaskRunner pid=168816)�[0m                                          'data_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                          'disable_log_stats': True,
�[36m(TaskRunner pid=168816)�[0m                                          'disaggregation': {'bootstrap_port': None,
�[36m(TaskRunner pid=168816)�[0m                                                             'decode_replicas': 1,
�[36m(TaskRunner pid=168816)�[0m                                                             'decode_tensor_model_parallel_size': None,
�[36m(TaskRunner pid=168816)�[0m                                                             'enabled': False,
�[36m(TaskRunner pid=168816)�[0m                                                             'ib_device': None,
�[36m(TaskRunner pid=168816)�[0m                                                             'prefill_replicas': 1,
�[36m(TaskRunner pid=168816)�[0m                                                             'transfer_backend': 'nixl'},
�[36m(TaskRunner pid=168816)�[0m                                          'do_sample': True,
�[36m(TaskRunner pid=168816)�[0m                                          'dtype': 'bfloat16',
�[36m(TaskRunner pid=168816)�[0m                                          'enable_chunked_prefill': True,
�[36m(TaskRunner pid=168816)�[0m                                          'enable_prefix_caching': True,
�[36m(TaskRunner pid=168816)�[0m                                          'enforce_eager': True,
�[36m(TaskRunner pid=168816)�[0m                                          'engine_kwargs': {},
�[36m(TaskRunner pid=168816)�[0m                                          'expert_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                          'free_cache_engine': True,
�[36m(TaskRunner pid=168816)�[0m                                          'gpu_memory_utilization': 0.5,
�[36m(TaskRunner pid=168816)�[0m                                          'ignore_eos': False,
�[36m(TaskRunner pid=168816)�[0m                                          'layered_summon': False,
�[36m(TaskRunner pid=168816)�[0m                                          'limit_images': None,
�[36m(TaskRunner pid=168816)�[0m                                          'load_format': 'auto',
�[36m(TaskRunner pid=168816)�[0m                                          'max_model_len': None,
�[36m(TaskRunner pid=168816)�[0m                                          'max_num_batched_tokens': 8192,
�[36m(TaskRunner pid=168816)�[0m                                          'max_num_seqs': 1024,
�[36m(TaskRunner pid=168816)�[0m                                          'mtp': None,
�[36m(TaskRunner pid=168816)�[0m                                          'multi_stage_wake_up': False,
�[36m(TaskRunner pid=168816)�[0m                                          'n': 1,
�[36m(TaskRunner pid=168816)�[0m                                          'name': '???',
�[36m(TaskRunner pid=168816)�[0m                                          'pipeline_model_parallel_size': 1,
�[36m(TaskRunner pid=168816)�[0m                                          'prompt_length': 2048,
�[36m(TaskRunner pid=168816)�[0m                                          'quantization': None,
�[36m(TaskRunner pid=168816)�[0m                                          'quantization_config_file': None,
�[36m(TaskRunner pid=168816)�[0m                                          'response_length': 2048,
�[36m(TaskRunner pid=168816)�[0m                                          'scheduling_policy': 'fcfs',
�[36m(TaskRunner pid=168816)�[0m                                          'skip_tokenizer_init': False,
�[36m(TaskRunner pid=168816)�[0m                                          'temperature': 1.0,
�[36m(TaskRunner pid=168816)�[0m                                          'tensor_model_parallel_size': 2,
�[36m(TaskRunner pid=168816)�[0m                                          'top_k': -1,
�[36m(TaskRunner pid=168816)�[0m                                          'top_p': 1}},
�[36m(TaskRunner pid=168816)�[0m             'sandbox_fusion': {'_target_': 'verl.workers.config.SandboxFusionConfig',
�[36m(TaskRunner pid=168816)�[0m                                'max_concurrent': 64,
�[36m(TaskRunner pid=168816)�[0m                                'memory_limit_mb': 1024,
�[36m(TaskRunner pid=168816)�[0m                                'url': None}},
�[36m(TaskRunner pid=168816)�[0m  'reward_model': {'enable': None,
�[36m(TaskRunner pid=168816)�[0m                   'enable_resource_pool': None,
�[36m(TaskRunner pid=168816)�[0m                   'model': {'external_lib': None,
�[36m(TaskRunner pid=168816)�[0m                             'path': None,
�[36m(TaskRunner pid=168816)�[0m                             'trust_remote_code': None},
�[36m(TaskRunner pid=168816)�[0m                   'n_gpus_per_node': None,
�[36m(TaskRunner pid=168816)�[0m                   'nnodes': None,
�[36m(TaskRunner pid=168816)�[0m                   'num_workers': None,
�[36m(TaskRunner pid=168816)�[0m                   'reward_loop_class_name': None,
�[36m(TaskRunner pid=168816)�[0m                   'reward_loop_module_path': None,
�[36m(TaskRunner pid=168816)�[0m                   'reward_loop_source': None,
�[36m(TaskRunner pid=168816)�[0m                   'reward_manager': None,
�[36m(TaskRunner pid=168816)�[0m                   'rollout': {'cudagraph_capture_sizes': None
�[36m(TaskRunner pid=168816)�[0m ,
�[36m(TaskRunner pid=168816)�[0m                               
�[36m(TaskRunner pid=168816)�[0m 'data_parallel_size': None,
�[36m(TaskRunner pid=168816)�[0m                               'disable_log_stats': None,
�[36m(TaskRunner pid=168816)�[0m                               'dtype': None,
�[36m(TaskRunner pid=168816)�[0m                               'enable_chunked_prefill': None,
�[36m(TaskRunner pid=168816)�[0m                               'enable_prefix_caching': None,
�[36m(TaskRunner pid=168816)�[0m                               'enforce_eager': None,
�[36m(TaskRunner pid=168816)�[0m                               'engine_kwargs': None,
�[36m(TaskRunner pid=168816)�[0m                               'expert_parallel_size': None,
�[36m(TaskRunner pid=168816)�[0m                               'free_cache_engine': None,
�[36m(TaskRunner pid=168816)�[0m                               'gpu_memory_utilization': None,
�[36m(TaskRunner pid=168816)�[0m                               'limit_images': None,
�[36m(TaskRunner pid=168816)�[0m                               'load_format': None,
�[36m(TaskRunner pid=168816)�[0m                               'max_model_len': None,
�[36m(TaskRunner pid=168816)�[0m                               'max_num_batched_tokens': None,
�[36m(TaskRunner pid=168816)�[0m                               'max_num_seqs': None,
�[36m(TaskRunner pid=168816)�[0m                               'name': None,
�[36m(TaskRunner pid=168816)�[0m                               'prompt_length': None,
�[36m(TaskRunner pid=168816)�[0m                               'response_length': None,
�[36m(TaskRunner pid=168816)�[0m                               'skip_tokenizer_init': None,
�[36m(TaskRunner pid=168816)�[0m                               'tensor_model_parallel_size': None}},
�[36m(TaskRunner pid=168816)�[0m  'sandbox_fusion': {'max_concurrent': None,
�[36m(TaskRunner pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/verl/trainer/main_ppo_v0.py:176: UserWarning: Disabled critic as algorithm.adv_estimator != gae. If it is not intended, please set critic.enable=True
�[36m(TaskRunner pid=168816)�[0m   use_critic=need_critic(config),
�[36m(TaskRunner pid=168816)�[0m Setting TOKENIZERS_PARALLELISM=false for forked processes.
�[36m(TaskRunner pid=168816)�[0m WARNING:2026-06-23 08:46:23,546:Setting TOKENIZERS_PARALLELISM=false for forked processes.
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):   0%|          | 0/7473 [00:00<?, ? examples/s]/usr/local/lib/python3.12/dist-packages/multiprocess/popen_fork.py:66: DeprecationWarning: This process (pid=168816) is multi-threaded, use of fork() may lead to deadlocks in the child.
�[36m(TaskRunner pid=168816)�[0m   self.pid = os.fork()
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  13%|█▎        | 1000/7473 [00:00<00:05, 1290.30 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  27%|██▋       | 2000/7473 [00:01<00:02, 2159.08 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  40%|████      | 3000/7473 [00:01<00:01, 2758.17 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  54%|█████▎    | 4000/7473 [00:01<00:01, 3140.50 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  67%|██████▋   | 5000/7473 [00:01<00:00, 3421.64 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  80%|████████  | 6000/7473 [00:02<00:00, 3634.80 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  94%|█████████▎| 7000/7473 [00:02<00:00, 3765.56 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1): 100%|██████████| 7473/7473 [00:02<00:00, 3817.82 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1): 100%|██████████| 7473/7473 [00:02<00:00, 3028.51 examples/s]
�[36m(TaskRunner pid=168816)�[0m Setting TOKENIZERS_PARALLELISM=false for forked processes.
�[36m(TaskRunner pid=168816)�[0m WARNING:2026-06-23 08:46:28,738:Setting TOKENIZERS_PARALLELISM=false for forked processes.
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):   0%|          | 0/1319 [00:00<?, ? examples/s]
�[36m(TaskRunner pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/multiprocess/popen_fork.py:66: DeprecationWarning: This process (pid=168816) is multi-threaded, use of fork() may lead to deadlocks in the child.
�[36m(TaskRunner pid=168816)�[0m   self.pid = os.fork()
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1):  76%|███████▌  | 1000/1319 [00:00<00:00, 1217.75 examples/s]
�[36m(TaskRunner pid=168816)�[0m 
Filtering prompts longer than 512 tokens (num_proc=1): 100%|██████████| 1319/1319 [00:01<00:00, 1295.30 examples/s]
�[36m(TaskRunner pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/verl/trainer/main_ppo_v0.py:217: FutureWarning: Warning: Class 'verl.trainer.ppo.ray_trainer.RayPPOTrainer' is deprecated. Please use 'Legacy trainer is deprecated, and wil be removed in v0.9.0. Please use `trainer.use_v1=True` instead.' instead.
�[36m(TaskRunner pid=168816)�[0m   trainer = RayPPOTrainer(
�[36m(TaskRunner pid=168816)�[0m /usr/local/lib/python3.12/dist-packages/verl/trainer/ppo/ray_trainer.py:348: UserWarning: Disabled critic as algorithm.adv_estimator != gae. If it is not intended, please set critic.enable=True
�[36m(TaskRunner pid=168816)�[0m   self.use_critic = need_critic(self.config)
�[36m(pid=170437)�[0m [W623 08:46:33.220712439 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=170436)�[0m [W623 08:46:33.209930871 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=170434)�[0m [W623 08:46:33.294928674 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=170435)�[0m [W623 08:46:33.349027132 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=170437)�[0m   import pkg_resources
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=170436)�[0m   import pkg_resources
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=170434)�[0m   import pkg_resources
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=170435)�[0m   import pkg_resources
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:441: ImportWarning: 
�[36m(pid=170434)�[0m     *************************************************************************************************************
�[36m(pid=170434)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=170434)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=170434)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=170434)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=170434)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=170434)�[0m     *************************************************************************************************************
�[36m(pid=170434)�[0m     
�[36m(pid=170434)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=170437)�[0m INFO:2026-06-23 08:46:34,968:Registered platform: intel (xpu)
�[36m(pid=170437)�[0m INFO:2026-06-23 08:46:34,993:Registered platform: cambricon (mlu)
�[36m(pid=170434)�[0m INFO:2026-06-23 08:46:34,973:Registered platform: intel (xpu)
�[36m(pid=170434)�[0m INFO:2026-06-23 08:46:34,999:Registered platform: cambricon (mlu)
�[36m(pid=170436)�[0m INFO:2026-06-23 08:46:34,985:Registered platform: intel (xpu)
�[36m(pid=170436)�[0m INFO:2026-06-23 08:46:35,011:Registered platform: cambricon (mlu)
�[36m(pid=170435)�[0m INFO:2026-06-23 08:46:34,985:Registered platform: intel (xpu)
�[36m(pid=170435)�[0m INFO:2026-06-23 08:46:35,011:Registered platform: cambricon (mlu)
�[36m(pid=170437)�[0m INFO:2026-06-23 08:46:35,018:Registered platform: metax (cuda)
�[36m(pid=170437)�[0m INFO:2026-06-23 08:46:35,043:Registered platform: enflame (gcu)
�[36m(pid=170434)�[0m INFO:2026-06-23 08:46:35,024:Registered platform: metax (cuda)
�[36m(pid=170434)�[0m INFO:2026-06-23 08:46:35,051:Registered platform: enflame (gcu)
�[36m(pid=170436)�[0m INFO:2026-06-23 08:46:35,037:Registered platform: metax (cuda)
�[36m(pid=170436)�[0m INFO:2026-06-23 08:46:35,064:Registered platform: enflame (gcu)
�[36m(pid=170435)�[0m INFO:2026-06-23 08:46:35,037:Registered platform: metax (cuda)
�[36m(pid=170435)�[0m INFO:2026-06-23 08:46:35,064:Registered platform: enflame (gcu)
�[36m(pid=170437)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170437)�[0m I0000 00:00:1782204396.186226  170437 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170437)�[0m I0000 00:00:1782204396.187100  170437 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170437)�[0m I0000 00:00:1782204396.232552  170437 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=170437)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=170434)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170434)�[0m I0000 00:00:1782204396.235639  170434 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170434)�[0m I0000 00:00:1782204396.236329  170434 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170436)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170436)�[0m I0000 00:00:1782204396.243815  170436 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170436)�[0m I0000 00:00:1782204396.244551  170436 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170435)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170435)�[0m I0000 00:00:1782204396.189987  170435 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170435)�[0m I0000 00:00:1782204396.190731  170435 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170435)�[0m I0000 00:00:1782204396.233627  170435 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=170435)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=170434)�[0m I0000 00:00:1782204396.278341  170434 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=170434)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=170436)�[0m I0000 00:00:1782204396.288062  170436 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=170436)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=170435)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170435)�[0m I0000 00:00:1782204397.311732  170435 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170435)�[0m I0000 00:00:1782204397.312129  170435 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170434)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170434)�[0m I0000 00:00:1782204397.374770  170434 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170434)�[0m I0000 00:00:1782204397.375180  170434 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170436)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170436)�[0m I0000 00:00:1782204397.373618  170436 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170436)�[0m I0000 00:00:1782204397.374045  170436 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170437)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=170437)�[0m I0000 00:00:1782204397.465570  170437 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=170437)�[0m I0000 00:00:1782204397.465963  170437 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170434)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170434)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170436)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170436)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170435)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170435)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170437)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=170437)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170434)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170436)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170435)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170437)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170435)�[0m INFO:2026-06-23 08:46:39,705:Platform override from VERL_PLATFORM: enflame
�[36m(pid=170435)�[0m DEBUG:2026-06-23 08:46:39,706:verl platform initialised: gcu
�[36m(pid=170434)�[0m INFO:2026-06-23 08:46:39,756:Platform override from VERL_PLATFORM: enflame
�[36m(pid=170434)�[0m DEBUG:2026-06-23 08:46:39,757:verl platform initialised: gcu
�[36m(pid=170436)�[0m INFO:2026-06-23 08:46:39,768:Platform override from VERL_PLATFORM: enflame
�[36m(pid=170436)�[0m DEBUG:2026-06-23 08:46:39,769:verl platform initialised: gcu
�[36m(pid=170437)�[0m INFO:2026-06-23 08:46:39,943:Platform override from VERL_PLATFORM: enflame
�[36m(pid=170437)�[0m DEBUG:2026-06-23 08:46:39,944:verl platform initialised: gcu
�[36m(pid=170437)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=170437)�[0m   warnings.warn(
�[36m(pid=170436)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=170436)�[0m   warnings.warn(
�[36m(pid=170435)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=170435)�[0m   warnings.warn(
�[36m(pid=170434)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=170434)�[0m   warnings.warn(
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170437)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170434)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170436)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=170436)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=170435)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=170437)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=170434)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=170435)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=170437)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/inference/contexts/__init__.py:9: DeprecationWarning: The following imports from `dynamic_context.py` will be removed in this file in `megatron-core` 0.14. The imports here result in a cyclic import issue that causes rotary embeddings to import from Apex rather than Transformer Engine.
�[36m(pid=170437)�[0m   warnings.warn(
�[36m(pid=170434)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/inference/contexts/__init__.py:9: DeprecationWarning: The following imports from `dynamic_context.py` will be removed in this file in `megatron-core` 0.14. The imports here result in a cyclic import issue that causes rotary embeddings to import from Apex rather than Transformer Engine.
�[36m(pid=170434)�[0m   warnings.warn(
�[36m(pid=170436)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/inference/contexts/__init__.py:9: DeprecationWarning: The following imports from `dynamic_context.py` will be removed in this file in `megatron-core` 0.14. The imports here result in a cyclic import issue that causes rotary embeddings to import from Apex rather than Transformer Engine.
�[36m(pid=170436)�[0m   warnings.warn(
�[36m(pid=170435)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/inference/contexts/__init__.py:9: DeprecationWarning: The following imports from `dynamic_context.py` will be removed in this file in `megatron-core` 0.14. The imports here result in a cyclic import issue that causes rotary embeddings to import from Apex rather than Transformer Engine.
�[36m(pid=170435)�[0m   warnings.warn(
�[36m(pid=170436)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/optimizer/clip_grads.py:32: UserWarning: Transformer Engine and Apex are not installed. Falling back to local implementations of multi_tensor_applier, multi_tensor_l2norm, and multi_tensor_scale
�[36m(pid=170436)�[0m   warnings.warn(
�[36m(pid=170437)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/optimizer/clip_grads.py:32: UserWarning: Transformer Engine and Apex are not installed. Falling back to local implementations of multi_tensor_applier, multi_tensor_l2norm, and multi_tensor_scale
�[36m(pid=170437)�[0m   warnings.warn(
�[36m(pid=170434)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/optimizer/clip_grads.py:32: UserWarning: Transformer Engine and Apex are not installed. Falling back to local implementations of multi_tensor_applier, multi_tensor_l2norm, and multi_tensor_scale
�[36m(pid=170434)�[0m   warnings.warn(
�[36m(pid=170435)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/optimizer/clip_grads.py:32: UserWarning: Transformer Engine and Apex are not installed. Falling back to local implementations of multi_tensor_applier, multi_tensor_l2norm, and multi_tensor_scale
�[36m(pid=170435)�[0m   warnings.warn(
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/torch/_utils.py:916: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
�[36m(pid=170434)�[0m   return self.fget.__get__(instance, owner)()
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/torch/_utils.py:916: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
�[36m(pid=170437)�[0m   return self.fget.__get__(instance, owner)()
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/torch/_utils.py:916: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
�[36m(pid=170436)�[0m   return self.fget.__get__(instance, owner)()
�[36m(pid=170437)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/backends.py:34: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170437)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170434)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/backends.py:34: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170434)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/torch/_utils.py:916: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
�[36m(pid=170435)�[0m   return self.fget.__get__(instance, owner)()
�[36m(pid=170436)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/backends.py:34: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170436)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170435)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/backends.py:34: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170435)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170437)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/gpt/gpt_layer_specs.py:71: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170437)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,417:Registered engines: fsdp_flagos
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,417:Registered engines: megatron_flagos
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,418:Registered engines: fsdp_xpu
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,418:Registered engines: megatron_xpu
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,418:Registered engines: fsdp_mlu
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,419:Registered engines: megatron_mlu
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,419:Registered engines: fsdp_metax
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,419:Registered engines: megatron_metax
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,419:Registered engines: fsdp_enflame
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,420:Registered engines: megatron_enflame
�[36m(pid=170437)�[0m INFO:2026-06-23 08:47:03,420:verl-hardware-plugin loaded successfully
�[36m(pid=170434)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/gpt/gpt_layer_specs.py:71: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170434)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,636:Registered engines: fsdp_flagos
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,636:Registered engines: megatron_flagos
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,637:Registered engines: fsdp_xpu
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,637:Registered engines: megatron_xpu
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,637:Registered engines: fsdp_mlu
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,638:Registered engines: megatron_mlu
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,638:Registered engines: fsdp_metax
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,638:Registered engines: megatron_metax
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,638:Registered engines: fsdp_enflame
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,639:Registered engines: megatron_enflame
�[36m(pid=170434)�[0m INFO:2026-06-23 08:47:03,639:verl-hardware-plugin loaded successfully
�[36m(pid=170436)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/gpt/gpt_layer_specs.py:71: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170436)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,842:Registered engines: fsdp_flagos
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,842:Registered engines: megatron_flagos
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,843:Registered engines: fsdp_xpu
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,843:Registered engines: megatron_xpu
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,843:Registered engines: fsdp_mlu
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,843:Registered engines: megatron_mlu
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,844:Registered engines: fsdp_metax
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,844:Registered engines: megatron_metax
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,844:Registered engines: fsdp_enflame
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,844:Registered engines: megatron_enflame
�[36m(pid=170436)�[0m INFO:2026-06-23 08:47:03,845:verl-hardware-plugin loaded successfully
�[36m(pid=170435)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/core/models/gpt/gpt_layer_specs.py:71: UserWarning: Apex is not installed. Falling back to Torch Norm
�[36m(pid=170435)�[0m   warnings.warn("Apex is not installed. Falling back to Torch Norm")
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,888:Registered engines: fsdp_flagos
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,888:Registered engines: megatron_flagos
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,888:Registered engines: fsdp_xpu
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,889:Registered engines: megatron_xpu
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,889:Registered engines: fsdp_mlu
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,889:Registered engines: megatron_mlu
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,889:Registered engines: fsdp_metax
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,890:Registered engines: megatron_metax
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,890:Registered engines: fsdp_enflame
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,890:Registered engines: megatron_enflame
�[36m(pid=170435)�[0m INFO:2026-06-23 08:47:03,890:verl-hardware-plugin loaded successfully
�[36m(pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/ray/util/state/util.py:55: DeprecationWarning: Ray state API is no longer experimental. Please import from `ray.util.state`. instead. Importing from `ray.experimental` will be deprecated in future releases. 
�[36m(pid=170435)�[0m   warnings.warn(
�[36m(pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/ray/util/state/util.py:55: DeprecationWarning: Ray state API is no longer experimental. Please import from `ray.util.state`. instead. Importing from `ray.experimental` will be deprecated in future releases. 
�[36m(pid=170437)�[0m   warnings.warn(
�[36m(pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/ray/util/state/util.py:55: DeprecationWarning: Ray state API is no longer experimental. Please import from `ray.util.state`. instead. Importing from `ray.experimental` will be deprecated in future releases. 
�[36m(pid=170434)�[0m   warnings.warn(
�[36m(pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/ray/util/state/util.py:55: DeprecationWarning: Ray state API is no longer experimental. Please import from `ray.util.state`. instead. Importing from `ray.experimental` will be deprecated in future releases. 
�[36m(pid=170436)�[0m   warnings.warn(
�[36m(WorkerDict pid=170437)�[0m INFO:2026-06-23 08:47:09,409:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170434)�[0m INFO:2026-06-23 08:47:09,409:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170436)�[0m INFO:2026-06-23 08:47:09,415:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170435)�[0m INFO:2026-06-23 08:47:09,413:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170434)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(WorkerDict pid=170434)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(WorkerDict pid=170435)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(WorkerDict pid=170437)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(WorkerDict pid=170435)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(WorkerDict pid=170437)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(WorkerDict pid=170436)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3ForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(WorkerDict pid=170436)�[0m Flash Attention 2 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in Qwen3Model is torch.float32. You should run training or inference using Automatic Mixed-Precision via the `with torch.autocast(device_type='torch_device'):` decorator, or load the model with the `torch_dtype` argument. Example: `model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)`
�[36m(TaskRunner pid=168816)�[0m                     'memory_limit_mb': None,
�[36m(TaskRunner pid=168816)�[0m                     'url': None},
�[36m(TaskRunner pid=168816)�[0m  'skip': {'_target_': 'verl.utils.skip.SkipManagerConfig',
�[36m(TaskRunner pid=168816)�[0m           'async_rollout': {'action': 'cache',
�[36m(TaskRunner pid=168816)�[0m                             'dump_dir': '~/.verl/rollout_dump',
�[36m(TaskRunner pid=168816)�[0m                             'enable': False,
�[36m(TaskRunner pid=168816)�[0m                             'steps': []},
�[36m(TaskRunner pid=168816)�[0m           'rollout': {'action': 'cache',
�[36m(TaskRunner pid=168816)�[0m                       'dump_dir': '~/.verl/rollout_dump',
�[36m(TaskRunner pid=168816)�[0m                       'enable': False,
�[36m(TaskRunner pid=168816)�[0m                       'steps': []}},
�[36m(TaskRunner pid=168816)�[0m  'trainer': {'balance_batch': True,
�[36m(TaskRunner pid=168816)�[0m              'critic_warmup': 0,
�[36m(TaskRunner pid=168816)�[0m              'default_hdfs_dir': None,
�[36m(TaskRunner pid=168816)�[0m              'default_local_dir': 'checkpoints/verl_grpo_enflame_fl/qwen3_0.6b_enflame_fl',
�[36m(TaskRunner pid=168816)�[0m              'del_local_ckpt_after_load': False,
�[36m(TaskRunner pid=168816)�[0m              'device': 'gcu',
�[36m(TaskRunner pid=168816)�[0m              'esi_redundant_time': 0,
�[36m(TaskRunner pid=168816)�[0m              'experiment_name': 'qwen3_0.6b_enflame_fl',
�[36m(TaskRunner pid=168816)�[0m              'log_val_generations': 0,
�[36m(TaskRunner pid=168816)�[0m              'logger': ['console'],
�[36m(TaskRunner pid=168816)�[0m              'max_actor_ckpt_to_keep': None,
�[36m(TaskRunner pid=168816)�[0m              'max_critic_ckpt_to_keep': None,
�[36m(TaskRunner pid=168816)�[0m              'n_gpus_per_node': 4,
�[36m(TaskRunner pid=168816)�[0m              'nnodes': 1,
�[36m(TaskRunner pid=168816)�[0m              'project_name': 'verl_grpo_enflame_fl',
�[36m(TaskRunner pid=168816)�[0m              'ray_wait_register_center_timeout': 60,
�[36m(TaskRunner pid=168816)�[0m              'resume_from_path': None,
�[36m(TaskRunner pid=168816)�[0m              'resume_mode': 'auto',
�[36m(TaskRunner pid=168816)�[0m              'rollout_data_dir': None,
�[36m(TaskRunner pid=168816)�[0m              'save_freq': 20,
�[36m(TaskRunner pid=168816)�[0m              'test_freq': 5,
�[36m(TaskRunner pid=168816)�[0m              'total_epochs': 15,
�[36m(TaskRunner pid=168816)�[0m              'total_training_steps': None,
�[36m(TaskRunner pid=168816)�[0m              'use_v1': False,
�[36m(TaskRunner pid=168816)�[0m              'v1': {'colocate_async': {'num_warmup_batches': 1},
�[36m(TaskRunner pid=168816)�[0m                     'sampler': {'custom_sampler': {'name': None, 'path': None},
�[36m(TaskRunner pid=168816)�[0m                                 'max_off_policy_strategy': 'drop',
�[36m(TaskRunner pid=168816)�[0m                                 'max_off_policy_threshold': 8,
�[36m(TaskRunner pid=168816)�[0m                                 'sampler_kwargs': {}},
�[36m(TaskRunner pid=168816)�[0m                     'separate_async': {'num_warmup_batches': 4,
�[36m(TaskRunner pid=168816)�[0m                                        'parameter_sync_step': 4},
�[36m(TaskRunner pid=168816)�[0m                     'sync': {},
�[36m(TaskRunner pid=168816)�[0m                     'trainer_mode': 'sync'},
�[36m(TaskRunner pid=168816)�[0m              'val_before_train': True,
�[36m(TaskRunner pid=168816)�[0m              'val_only': False,
�[36m(TaskRunner pid=168816)�[0m              'validation_data_dir': None},
�[36m(TaskRunner pid=168816)�[0m  'transfer_queue': {'backend': {'MooncakeStore': {'auto_init': False,
�[36m(TaskRunner pid=168816)�[0m                                                   'device_name': '',
�[36m(TaskRunner pid=168816)�[0m                                                   'global_segment_size': 4294967296,
�[36m(TaskRunner pid=168816)�[0m                                                   'local_buffer_size': 1073741824,
�[36m(TaskRunner pid=168816)�[0m                                                   'local_hostname': 'localhost',
�[36m(TaskRunner pid=168816)�[0m                                                   'master_server_address': 'localhost:50124',
�[36m(TaskRunner pid=168816)�[0m                                                   'metadata_server': 'localhost:50123',
�[36m(TaskRunner pid=168816)�[0m                                                   'protocol': 'tcp'},
�[36m(TaskRunner pid=168816)�[0m                                 'SimpleStorage': {'num_data_storage_units': 8,
�[36m(TaskRunner pid=168816)�[0m                                                   'total_storage_size': 100000},
�[36m(TaskRunner pid=168816)�[0m                                 'storage_backend': 'SimpleStorage'},
�[36m(TaskRunner pid=168816)�[0m                     'enable': False,
�[36m(TaskRunner pid=168816)�[0m                     'metrics': {'enabled': False, 'port': 0}}}
�[36m(TaskRunner pid=168816)�[0m [validate_config] All configuration checks passed successfully!
�[36m(TaskRunner pid=168816)�[0m Using dataset class: RLHFDataset
�[36m(TaskRunner pid=168816)�[0m dataset len: 7473
�[36m(TaskRunner pid=168816)�[0m filter dataset len: 7473
�[36m(TaskRunner pid=168816)�[0m Using dataset class: RLHFDataset
�[36m(TaskRunner pid=168816)�[0m dataset len: 1319
�[36m(TaskRunner pid=168816)�[0m filter dataset len: 1319
�[36m(TaskRunner pid=168816)�[0m Size of train dataloader: 116, Size of val dataloader: 1
�[36m(TaskRunner pid=168816)�[0m Total training steps: 1740
�[36m(TaskRunner pid=168816)�[0m colocated worker base class <class 'verl.single_controller.base.worker.Worker'>
�[36m(pid=170435)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=170436)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=170434)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=170437)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=170436)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=170436)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=170437)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=170437)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=170435)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=170435)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=170436)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=170436)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=170437)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=170437)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=170435)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=170435)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=170434)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=170434)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=170434)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=170434)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(WorkerDict pid=170436)�[0m [Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170436)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170437)�[0m [Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170437)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170434)�[0m [Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170434)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170435)�[0m [Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170435)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170436)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170434)�[0m DEBUG:2026-06-23 08:47:10,847:After init model from HF AutoModel, memory allocated (GB): 0.00, memory reserved (GB): 0.00, device memory used/total (GB): 2.20/144.00
�[36m(WorkerDict pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170437)�[0m   warnings.warn(
�[36m(WorkerDict pid=170437)�[0m INFO:2026-06-23 08:47:12,186:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170437)�[0m INFO:2026-06-23 08:47:12,198:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170434)�[0m   warnings.warn(
�[36m(WorkerDict pid=170434)�[0m DEBUG:2026-06-23 08:47:12,190:After offload model/optimizer/grad during init, memory allocated (GB): 0.00, memory reserved (GB): 1.46, device memory used/total (GB): 4.33/144.00
�[36m(WorkerDict pid=170434)�[0m INFO:2026-06-23 08:47:12,190:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170436)�[0m   warnings.warn(
�[36m(WorkerDict pid=170436)�[0m INFO:2026-06-23 08:47:12,186:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170436)�[0m INFO:2026-06-23 08:47:12,198:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170435)�[0m   warnings.warn(
�[36m(WorkerDict pid=170435)�[0m INFO:2026-06-23 08:47:12,189:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170435)�[0m INFO:2026-06-23 08:47:12,200:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170434)�[0m INFO:2026-06-23 08:47:12,201:FSDPEnflameEngineWithLMHead initialized
�[36m(WorkerDict pid=170434)�[0m DEBUG:2026-06-23 08:47:13,294:After init model from HF AutoModel, memory allocated (GB): 0.00, memory reserved (GB): 1.46, device memory used/total (GB): 4.33/144.00
�[36m(WorkerDict pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170437)�[0m   warnings.warn(
�[36m(WorkerDict pid=170437)�[0m INFO:2026-06-23 08:47:13,734:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170436)�[0m   warnings.warn(
�[36m(WorkerDict pid=170436)�[0m INFO:2026-06-23 08:47:13,733:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170435)�[0m   warnings.warn(
�[36m(WorkerDict pid=170435)�[0m INFO:2026-06-23 08:47:13,730:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:675: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html .
�[36m(WorkerDict pid=170434)�[0m   warnings.warn(
�[36m(WorkerDict pid=170434)�[0m DEBUG:2026-06-23 08:47:13,833:After offload model/optimizer/grad during init, memory allocated (GB): 0.56, memory reserved (GB): 2.04, device memory used/total (GB): 4.91/144.00
�[36m(WorkerDict pid=170434)�[0m INFO:2026-06-23 08:47:13,833:FSDPEnflameEngineWithLMHead initialized for ENFLAME
�[36m(WorkerDict pid=170434)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
�[36m(WorkerDict pid=170434)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
�[36m(WorkerDict pid=170437)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
�[36m(WorkerDict pid=170437)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
�[36m(WorkerDict pid=170436)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
�[36m(WorkerDict pid=170436)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
�[36m(WorkerDict pid=170435)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
�[36m(WorkerDict pid=170435)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
�[36m(WorkerDict pid=170434)�[0m INFO:2026-06-23 08:47:17,140:Memory cleanup attempt 1: Freed 1.28 GB reserved, 0.00 GB allocated
�[36m(WorkerDict pid=170437)�[0m INFO:2026-06-23 08:47:17,388:Memory cleanup attempt 1: Freed 1.28 GB reserved, 0.00 GB allocated
�[36m(WorkerDict pid=170435)�[0m INFO:2026-06-23 08:47:17,373:Memory cleanup attempt 1: Freed 1.28 GB reserved, 0.00 GB allocated
�[36m(WorkerDict pid=170436)�[0m INFO:2026-06-23 08:47:17,457:Memory cleanup attempt 1: Freed 1.28 GB reserved, 0.00 GB allocated
�[36m(WorkerDict pid=170434)�[0m INFO:2026-06-23 08:47:18,006:Memory cleanup attempt 2: Freed 0.00 GB reserved, 0.00 GB allocated
�[36m(WorkerDict pid=170435)�[0m INFO:2026-06-23 08:47:18,154:Memory cleanup attempt 2: Freed 0.00 GB reserved, 0.00 GB allocated
�[36m(WorkerDict pid=170437)�[0m INFO:2026-06-23 08:47:18,212:Memory cleanup attempt 2: Freed 0.00 GB reserved, 0.00 GB allocated
�[36m(WorkerDict pid=170436)�[0m INFO:2026-06-23 08:47:18,237:Memory cleanup attempt 2: Freed 0.00 GB reserved, 0.00 GB allocated
�[36m(TaskRunner pid=168816)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute
�[36m(TaskRunner pid=168816)�[0m <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute
�[36m(pid=171897)�[0m [W623 08:47:20.665929520 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171895)�[0m [W623 08:47:20.666452623 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171894)�[0m [W623 08:47:20.695083222 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171898)�[0m [W623 08:47:20.682125756 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171896)�[0m [W623 08:47:20.711520096 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171891)�[0m [W623 08:47:20.817604034 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171892)�[0m [W623 08:47:20.785815456 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171897)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171897)�[0m   import pkg_resources
�[36m(pid=171895)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171895)�[0m   import pkg_resources
�[36m(pid=171893)�[0m [W623 08:47:20.825818399 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=171896)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171896)�[0m   import pkg_resources
�[36m(pid=171894)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171894)�[0m   import pkg_resources
�[36m(pid=171898)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171898)�[0m   import pkg_resources
�[36m(pid=171892)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171892)�[0m   import pkg_resources
�[36m(pid=171891)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171891)�[0m   import pkg_resources
�[36m(pid=171893)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=171893)�[0m   import pkg_resources
�[36m(pid=171896)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171896)�[0m     *************************************************************************************************************
�[36m(pid=171896)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171896)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171896)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171896)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171896)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171896)�[0m     *************************************************************************************************************
�[36m(pid=171896)�[0m     
�[36m(pid=171896)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171895)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171895)�[0m     *************************************************************************************************************
�[36m(pid=171895)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171895)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171895)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171895)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171895)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171895)�[0m     *************************************************************************************************************
�[36m(pid=171895)�[0m     
�[36m(pid=171895)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171896)�[0m INFO:2026-06-23 08:47:22,307:Registered platform: intel (xpu)
�[36m(pid=171896)�[0m INFO:2026-06-23 08:47:22,307:Registered platform: cambricon (mlu)
�[36m(pid=171896)�[0m INFO:2026-06-23 08:47:22,307:Registered platform: metax (cuda)
�[36m(pid=171896)�[0m INFO:2026-06-23 08:47:22,308:Registered platform: enflame (gcu)
�[36m(pid=171897)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171897)�[0m     *************************************************************************************************************
�[36m(pid=171897)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171897)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171897)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171897)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171897)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171897)�[0m     *************************************************************************************************************
�[36m(pid=171897)�[0m     
�[36m(pid=171897)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171895)�[0m INFO:2026-06-23 08:47:22,340:Registered platform: intel (xpu)
�[36m(pid=171895)�[0m INFO:2026-06-23 08:47:22,341:Registered platform: cambricon (mlu)
�[36m(pid=171895)�[0m INFO:2026-06-23 08:47:22,341:Registered platform: metax (cuda)
�[36m(pid=171895)�[0m INFO:2026-06-23 08:47:22,341:Registered platform: enflame (gcu)
�[36m(pid=171894)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171894)�[0m     *************************************************************************************************************
�[36m(pid=171894)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171894)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171894)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171894)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171894)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171894)�[0m     *************************************************************************************************************
�[36m(pid=171894)�[0m     
�[36m(pid=171894)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171898)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171898)�[0m     *************************************************************************************************************
�[36m(pid=171898)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171898)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171898)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171898)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171898)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171898)�[0m     *************************************************************************************************************
�[36m(pid=171898)�[0m     
�[36m(pid=171898)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171892)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171892)�[0m     *************************************************************************************************************
�[36m(pid=171892)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171892)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171892)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171892)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171892)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171892)�[0m     *************************************************************************************************************
�[36m(pid=171892)�[0m     
�[36m(pid=171892)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171897)�[0m INFO:2026-06-23 08:47:22,412:Registered platform: intel (xpu)
�[36m(pid=171897)�[0m INFO:2026-06-23 08:47:22,413:Registered platform: cambricon (mlu)
�[36m(pid=171897)�[0m INFO:2026-06-23 08:47:22,413:Registered platform: metax (cuda)
�[36m(pid=171897)�[0m INFO:2026-06-23 08:47:22,413:Registered platform: enflame (gcu)
�[36m(pid=171891)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171891)�[0m     *************************************************************************************************************
�[36m(pid=171891)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171891)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171891)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171891)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171891)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171891)�[0m     *************************************************************************************************************
�[36m(pid=171891)�[0m     
�[36m(pid=171891)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171894)�[0m INFO:2026-06-23 08:47:22,402:Registered platform: intel (xpu)
�[36m(pid=171894)�[0m INFO:2026-06-23 08:47:22,404:Registered platform: cambricon (mlu)
�[36m(pid=171894)�[0m INFO:2026-06-23 08:47:22,404:Registered platform: metax (cuda)
�[36m(pid=171894)�[0m INFO:2026-06-23 08:47:22,404:Registered platform: enflame (gcu)
�[36m(pid=171898)�[0m INFO:2026-06-23 08:47:22,397:Registered platform: intel (xpu)
�[36m(pid=171898)�[0m INFO:2026-06-23 08:47:22,398:Registered platform: cambricon (mlu)
�[36m(pid=171898)�[0m INFO:2026-06-23 08:47:22,398:Registered platform: metax (cuda)
�[36m(pid=171898)�[0m INFO:2026-06-23 08:47:22,399:Registered platform: enflame (gcu)
�[36m(pid=171892)�[0m INFO:2026-06-23 08:47:22,397:Registered platform: intel (xpu)
�[36m(pid=171892)�[0m INFO:2026-06-23 08:47:22,398:Registered platform: cambricon (mlu)
�[36m(pid=171892)�[0m INFO:2026-06-23 08:47:22,398:Registered platform: metax (cuda)
�[36m(pid=171892)�[0m INFO:2026-06-23 08:47:22,399:Registered platform: enflame (gcu)
�[36m(pid=171893)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=171893)�[0m     *************************************************************************************************************
�[36m(pid=171893)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=171893)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=171893)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=171893)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=171893)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=171893)�[0m     *************************************************************************************************************
�[36m(pid=171893)�[0m     
�[36m(pid=171893)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=171891)�[0m INFO:2026-06-23 08:47:22,517:Registered platform: intel (xpu)
�[36m(pid=171891)�[0m INFO:2026-06-23 08:47:22,518:Registered platform: cambricon (mlu)
�[36m(pid=171891)�[0m INFO:2026-06-23 08:47:22,518:Registered platform: metax (cuda)
�[36m(pid=171891)�[0m INFO:2026-06-23 08:47:22,519:Registered platform: enflame (gcu)
�[36m(pid=171893)�[0m INFO:2026-06-23 08:47:22,549:Registered platform: intel (xpu)
�[36m(pid=171893)�[0m INFO:2026-06-23 08:47:22,550:Registered platform: cambricon (mlu)
�[36m(pid=171893)�[0m INFO:2026-06-23 08:47:22,550:Registered platform: metax (cuda)
�[36m(pid=171893)�[0m INFO:2026-06-23 08:47:22,551:Registered platform: enflame (gcu)
�[36m(pid=171896)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171896)�[0m I0000 00:00:1782204443.499578  171896 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171896)�[0m I0000 00:00:1782204443.500304  171896 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171895)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171895)�[0m I0000 00:00:1782204443.530700  171895 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171895)�[0m I0000 00:00:1782204443.531624  171895 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171896)�[0m I0000 00:00:1782204443.543573  171896 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171896)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171897)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171897)�[0m I0000 00:00:1782204443.603577  171897 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171897)�[0m I0000 00:00:1782204443.604759  171897 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171895)�[0m I0000 00:00:1782204443.576534  171895 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171895)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171894)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171894)�[0m I0000 00:00:1782204443.603577  171894 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171894)�[0m I0000 00:00:1782204443.604781  171894 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171898)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171898)�[0m I0000 00:00:1782204443.603577  171898 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171898)�[0m I0000 00:00:1782204443.604771  171898 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171892)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171892)�[0m I0000 00:00:1782204443.603612  171892 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171892)�[0m I0000 00:00:1782204443.604773  171892 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171897)�[0m I0000 00:00:1782204443.647403  171897 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171897)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171891)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171891)�[0m I0000 00:00:1782204443.749142  171891 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171891)�[0m I0000 00:00:1782204443.749888  171891 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171894)�[0m I0000 00:00:1782204443.647407  171894 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171894)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171898)�[0m I0000 00:00:1782204443.647403  171898 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171898)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171892)�[0m I0000 00:00:1782204443.647404  171892 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171892)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171893)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171893)�[0m I0000 00:00:1782204443.724939  171893 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171893)�[0m I0000 00:00:1782204443.725725  171893 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171891)�[0m I0000 00:00:1782204443.792614  171891 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171891)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171893)�[0m I0000 00:00:1782204443.771123  171893 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=171893)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=172375)�[0m [W623 08:47:24.186899285 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=172376)�[0m [W623 08:47:24.164816747 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=172377)�[0m [W623 08:47:24.254537161 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=172376)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=172376)�[0m   import pkg_resources
�[36m(pid=172378)�[0m [W623 08:47:24.339086391 gcu_caching_allocator.cpp:2979] Warning: [GCU Allocator] Static initialization: PYTORCH_GCU_ALLOC_CONF="(not set, using default)", selected_backend=native (function BackendStaticInitializer)
�[36m(pid=172375)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=172375)�[0m   import pkg_resources
�[36m(pid=172377)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=172377)�[0m   import pkg_resources
�[36m(pid=172378)�[0m /usr/local/lib/python3.12/dist-packages/triton_gcu/__init__.py:18: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
�[36m(pid=172378)�[0m   import pkg_resources
�[36m(pid=171896)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171896)�[0m I0000 00:00:1782204444.706041  171896 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171896)�[0m I0000 00:00:1782204444.706479  171896 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171895)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171895)�[0m I0000 00:00:1782204444.721309  171895 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171895)�[0m I0000 00:00:1782204444.721839  171895 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171892)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171892)�[0m I0000 00:00:1782204444.829186  171892 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171892)�[0m I0000 00:00:1782204444.829656  171892 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171896)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171896)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171896)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171896)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=171897)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171897)�[0m I0000 00:00:1782204444.878269  171897 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171897)�[0m I0000 00:00:1782204444.878672  171897 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171895)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171895)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171895)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171895)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=171894)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171894)�[0m I0000 00:00:1782204444.857860  171894 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171894)�[0m I0000 00:00:1782204444.858257  171894 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171898)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171898)�[0m I0000 00:00:1782204444.866028  171898 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171898)�[0m I0000 00:00:1782204444.866419  171898 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171893)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171893)�[0m I0000 00:00:1782204444.899469  171893 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171893)�[0m I0000 00:00:1782204444.899881  171893 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171897)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171897)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171897)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171897)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=171891)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=171891)�[0m I0000 00:00:1782204445.021009  171891 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=171891)�[0m I0000 00:00:1782204445.021464  171891 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171894)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171894)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171894)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171894)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=171898)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171898)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171898)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171898)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=171892)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171892)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171892)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171892)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=171893)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171893)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171893)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171893)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=171891)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171891)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=171891)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=171891)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=172375)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=172375)�[0m     *************************************************************************************************************
�[36m(pid=172375)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=172375)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=172375)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=172375)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=172375)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=172375)�[0m     *************************************************************************************************************
�[36m(pid=172375)�[0m     
�[36m(pid=172375)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=172376)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=172376)�[0m     *************************************************************************************************************
�[36m(pid=172376)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=172376)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=172376)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=172376)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=172376)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=172376)�[0m     *************************************************************************************************************
�[36m(pid=172376)�[0m     
�[36m(pid=172376)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=172375)�[0m INFO:2026-06-23 08:47:25,847:Registered platform: intel (xpu)
�[36m(pid=172375)�[0m INFO:2026-06-23 08:47:25,848:Registered platform: cambricon (mlu)
�[36m(pid=172375)�[0m INFO:2026-06-23 08:47:25,848:Registered platform: metax (cuda)
�[36m(pid=172375)�[0m INFO:2026-06-23 08:47:25,848:Registered platform: enflame (gcu)
�[36m(pid=172376)�[0m INFO:2026-06-23 08:47:25,824:Registered platform: intel (xpu)
�[36m(pid=172376)�[0m INFO:2026-06-23 08:47:25,825:Registered platform: cambricon (mlu)
�[36m(pid=172376)�[0m INFO:2026-06-23 08:47:25,825:Registered platform: metax (cuda)
�[36m(pid=172376)�[0m INFO:2026-06-23 08:47:25,826:Registered platform: enflame (gcu)
�[36m(pid=172377)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=172377)�[0m     *************************************************************************************************************
�[36m(pid=172377)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=172377)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=172377)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=172377)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=172377)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=172377)�[0m     *************************************************************************************************************
�[36m(pid=172377)�[0m     
�[36m(pid=172377)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=172377)�[0m INFO:2026-06-23 08:47:25,930:Registered platform: intel (xpu)
�[36m(pid=172377)�[0m INFO:2026-06-23 08:47:25,931:Registered platform: cambricon (mlu)
�[36m(pid=172377)�[0m INFO:2026-06-23 08:47:25,931:Registered platform: metax (cuda)
�[36m(pid=172377)�[0m INFO:2026-06-23 08:47:25,931:Registered platform: enflame (gcu)
�[36m(pid=172378)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:443: ImportWarning: 
�[36m(pid=172378)�[0m     *************************************************************************************************************
�[36m(pid=172378)�[0m     The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.gcu and torch.nn.Module.gcu now..
�[36m(pid=172378)�[0m     The backend in torch.distributed.init_process_group set to eccl now..
�[36m(pid=172378)�[0m     The torch.cuda.* and torch.cuda.amp.* are replaced with torch.gcu.* and torch.gcu.amp.* now..
�[36m(pid=172378)�[0m     The device parameters have been replaced with gcu in the function below:
�[36m(pid=172378)�[0m     torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.load, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
�[36m(pid=172378)�[0m     *************************************************************************************************************
�[36m(pid=172378)�[0m     
�[36m(pid=172378)�[0m   warnings.warn(msg, ImportWarning)
�[36m(pid=172378)�[0m INFO:2026-06-23 08:47:26,077:Registered platform: intel (xpu)
�[36m(pid=172378)�[0m INFO:2026-06-23 08:47:26,078:Registered platform: cambricon (mlu)
�[36m(pid=172378)�[0m INFO:2026-06-23 08:47:26,078:Registered platform: metax (cuda)
�[36m(pid=172378)�[0m INFO:2026-06-23 08:47:26,079:Registered platform: enflame (gcu)
�[36m(pid=171896)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171896)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171895)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171895)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171892)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171892)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172375)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172375)�[0m I0000 00:00:1782204447.039716  172375 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172375)�[0m I0000 00:00:1782204447.040451  172375 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=172376)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172376)�[0m I0000 00:00:1782204447.050506  172376 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172376)�[0m I0000 00:00:1782204447.051199  172376 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171897)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171897)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171895)�[0m INFO:2026-06-23 08:47:27,173:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171895)�[0m DEBUG:2026-06-23 08:47:27,173:verl platform initialised: gcu
�[36m(pid=171894)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171894)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171898)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171898)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171893)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171893)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172375)�[0m I0000 00:00:1782204447.085266  172375 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=172375)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=172376)�[0m I0000 00:00:1782204447.093557  172376 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=172376)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=172377)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172377)�[0m I0000 00:00:1782204447.125300  172377 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172377)�[0m I0000 00:00:1782204447.126049  172377 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=172377)�[0m I0000 00:00:1782204447.170724  172377 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=172377)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171896)�[0m INFO:2026-06-23 08:47:27,232:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171896)�[0m DEBUG:2026-06-23 08:47:27,233:verl platform initialised: gcu
�[36m(pid=171891)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171891)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172378)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172378)�[0m I0000 00:00:1782204447.294416  172378 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172378)�[0m I0000 00:00:1782204447.295433  172378 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=171894)�[0m INFO:2026-06-23 08:47:27,394:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171894)�[0m DEBUG:2026-06-23 08:47:27,395:verl platform initialised: gcu
�[36m(pid=171898)�[0m INFO:2026-06-23 08:47:27,384:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171898)�[0m DEBUG:2026-06-23 08:47:27,385:verl platform initialised: gcu
�[36m(pid=171892)�[0m INFO:2026-06-23 08:47:27,359:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171892)�[0m DEBUG:2026-06-23 08:47:27,360:verl platform initialised: gcu
�[36m(pid=172378)�[0m I0000 00:00:1782204447.340332  172378 cpu_feature_guard.cc:227] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
�[36m(pid=172378)�[0m To enable the following instructions: AVX2 AVX512F AVX512_VNNI AVX512_BF16 AVX512_FP16 AVX_VNNI AMX_TILE AMX_INT8 AMX_BF16 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
�[36m(pid=171897)�[0m INFO:2026-06-23 08:47:27,433:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171897)�[0m DEBUG:2026-06-23 08:47:27,434:verl platform initialised: gcu
�[36m(pid=171893)�[0m INFO:2026-06-23 08:47:27,417:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171893)�[0m DEBUG:2026-06-23 08:47:27,418:verl platform initialised: gcu
�[36m(pid=171891)�[0m INFO:2026-06-23 08:47:27,582:Platform override from VERL_PLATFORM: enflame
�[36m(pid=171891)�[0m DEBUG:2026-06-23 08:47:27,582:verl platform initialised: gcu
�[36m(pid=172375)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172375)�[0m I0000 00:00:1782204448.201505  172375 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172375)�[0m I0000 00:00:1782204448.201938  172375 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=172376)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172376)�[0m I0000 00:00:1782204448.182078  172376 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172376)�[0m I0000 00:00:1782204448.182452  172376 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=172375)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172375)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=172375)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172375)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=172376)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172376)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=172376)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172376)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=172377)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172377)�[0m I0000 00:00:1782204448.289426  172377 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172377)�[0m I0000 00:00:1782204448.289847  172377 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=172377)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172377)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=172377)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172377)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=172378)�[0m WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
�[36m(pid=172378)�[0m I0000 00:00:1782204448.432293  172378 port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
�[36m(pid=172378)�[0m I0000 00:00:1782204448.432705  172378 cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
�[36m(pid=172378)�[0m /usr/local/lib/python3.12/dist-packages/dateutil/tz/tz.py:37: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172378)�[0m   EPOCH = datetime.datetime.utcfromtimestamp(0)
�[36m(pid=172378)�[0m /usr/local/lib/python3.12/dist-packages/pytz/tzinfo.py:27: DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC).
�[36m(pid=172378)�[0m   _epoch = datetime.utcfromtimestamp(0)
�[36m(pid=172377)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172377)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172375)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172375)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172376)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172376)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172378)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172378)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172376)�[0m INFO:2026-06-23 08:47:30,264:Platform override from VERL_PLATFORM: enflame
�[36m(pid=172376)�[0m DEBUG:2026-06-23 08:47:30,264:verl platform initialised: gcu
�[36m(pid=172377)�[0m INFO:2026-06-23 08:47:30,248:Platform override from VERL_PLATFORM: enflame
�[36m(pid=172377)�[0m DEBUG:2026-06-23 08:47:30,249:verl platform initialised: gcu
�[36m(pid=172375)�[0m INFO:2026-06-23 08:47:30,347:Platform override from VERL_PLATFORM: enflame
�[36m(pid=172375)�[0m DEBUG:2026-06-23 08:47:30,347:verl platform initialised: gcu
�[36m(pid=172378)�[0m INFO:2026-06-23 08:47:30,395:Platform override from VERL_PLATFORM: enflame
�[36m(pid=172378)�[0m DEBUG:2026-06-23 08:47:30,395:verl platform initialised: gcu
�[36m(pid=171896)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171896)�[0m   warnings.warn(
�[36m(pid=171894)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171894)�[0m   warnings.warn(
�[36m(pid=171898)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171898)�[0m   warnings.warn(
�[36m(pid=171897)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171897)�[0m   warnings.warn(
�[36m(pid=171895)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171895)�[0m   warnings.warn(
�[36m(pid=171892)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171892)�[0m   warnings.warn(
�[36m(pid=171891)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171891)�[0m   warnings.warn(
�[36m(pid=171893)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=171893)�[0m   warnings.warn(
�[36m(pid=172376)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=172376)�[0m   warnings.warn(
�[36m(pid=172377)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=172377)�[0m   warnings.warn(
�[36m(pid=172378)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=172378)�[0m   warnings.warn(
�[36m(pid=171896)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171896)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171894)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171894)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171892)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171892)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(WorkerDict pid=170436)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170437)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170437)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170434)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170434)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170435)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170435)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170434)�[0m Qwen3ForCausalLM contains 596.05M parameters
�[36m(WorkerDict pid=170434)�[0m Before FSDP, memory allocated (GB): 0.00, memory reserved (GB): 0.00, device memory used/total (GB): 2.20/144.00
�[36m(WorkerDict pid=170434)�[0m ECCL version 3.6.3.9 + compiled with TopsPlatform 1.7.2.21
�[36m(WorkerDict pid=170436)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170437)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170434)�[0m After FSDP, memory allocated (GB): 0.00, memory reserved (GB): 1.46, device memory used/total (GB): 4.33/144.00
�[36m(WorkerDict pid=170435)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170434)�[0m Warning: Failed to set NUMA affinity: NVML Shared Library Not Found
�[36m(WorkerDict pid=170436)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170436)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170437)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170437)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170434)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170434)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170434)�[0m Qwen3ForCausalLM contains 596.05M parameters
�[36m(WorkerDict pid=170434)�[0m Before FSDP, memory allocated (GB): 0.00, memory reserved (GB): 1.46, device memory used/total (GB): 4.33/144.00
�[36m(WorkerDict pid=170435)�[0m Monkey patch _flash_attention_forward in transformers.integrations.flash_attention
�[36m(WorkerDict pid=170435)�[0m Skipping monkey patch for Qwen3ForCausalLM as use_fused_kernels is False or fused_kernels_backend is torch
�[36m(WorkerDict pid=170436)�[0m [Gloo] Rank 2 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170436)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(WorkerDict pid=170436)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(WorkerDict pid=170437)�[0m [Gloo] Rank 3 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170437)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(WorkerDict pid=170437)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(WorkerDict pid=170434)�[0m After FSDP, memory allocated (GB): 0.56, memory reserved (GB): 2.04, device memory used/total (GB): 4.91/144.00
�[36m(WorkerDict pid=170434)�[0m Total steps: 1740, num_warmup_steps: 0
�[36m(WorkerDict pid=170434)�[0m [Gloo] Rank 0 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170434)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(WorkerDict pid=170434)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(WorkerDict pid=170435)�[0m [Gloo] Rank 1 is connected to 3 peer ranks. Expected number of connected peer ranks is : 3
�[36m(WorkerDict pid=170435)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(WorkerDict pid=170435)�[0m [Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
�[36m(pid=171895)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171896)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171898)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171892)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171893)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171894)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171897)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171891)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=172375)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=172376)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=172377)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=172378)�[0m +++++++++++++++++transformer_engine...........
�[36m(pid=171894)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171896)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171894)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171894)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171894)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=171896)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171896)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171896)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=171897)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171897)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171897)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171897)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=171898)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171898)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171898)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171898)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=171895)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171895)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171895)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171895)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=171892)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171892)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171892)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171892)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=171891)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171891)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171891)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171891)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=171893)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171893)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=171893)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=171893)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=172376)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=172376)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=172376)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=172376)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=172377)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=172377)�[0m Megatron-LM-FL Platform: cuda Registered
�[36m(pid=172377)�[0m Megatron-LM-FL Platform: enflame Registered
�[36m(pid=172377)�[0m Megatron-LM-FL Platform: cuda Selected
�[36m(pid=172378)�[0m Megatron-LM-FL Platform: cpu Registered
�[36m(pid=171896)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171896)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=171894)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171894)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=171898)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171898)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171892)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171892)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=171897)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171897)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171897)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171897)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=171895)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171895)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171898)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171898)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=172375)�[0m /home/xijun.gong/icode/Megatron-LM-FL/megatron/plugin/hetero/parallel_context.py:19: ImportWarning: flagcx is not installed, you can't use flagcx backend for communication.
�[36m(pid=172375)�[0m   warnings.warn(
�[36m(pid=171895)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171895)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=171893)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171893)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171893)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171893)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=171891)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=171891)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=171891)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=171891)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=172376)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172376)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172376)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=172376)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=172377)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172377)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172377)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=172377)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=172375)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172375)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172375)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=172375)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=172378)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/transfer_to_gcu.py:310: RuntimeWarning: torch.jit.script will be disabled by transfer_to_gcu, which currently does not support it.
�[36m(pid=172378)�[0m   warnings.warn(msg, RuntimeWarning)
�[36m(pid=172378)�[0m /usr/local/lib/python3.12/dist-packages/Cython/Distutils/old_build_ext.py:15: DeprecationWarning: dep_util is Deprecated. Use functions from setuptools instead.
�[36m(pid=172378)�[0m   from distutils.dep_util import newer, newer_group
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,186:Registered engines: fsdp_flagos
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,205:Registered engines: fsdp_flagos
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,202:Registered engines: fsdp_flagos
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,180:Registered engines: fsdp_flagos
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,186:Registered engines: fsdp_flagos
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,187:Registered engines: fsdp_flagos
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,179:Registered engines: fsdp_flagos
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,205:Registered engines: fsdp_flagos
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,205:Registered engines: fsdp_flagos
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,211:Registered engines: fsdp_flagos
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,211:Registered engines: fsdp_flagos
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,229:Registered engines: megatron_flagos
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,255:Registered engines: fsdp_xpu
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,281:Registered engines: megatron_xpu
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,305:Registered engines: fsdp_mlu
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,234:Registered engines: megatron_flagos
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,258:Registered engines: fsdp_xpu
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,283:Registered engines: megatron_xpu
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,307:Registered engines: fsdp_mlu
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,230:Registered engines: megatron_flagos
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,256:Registered engines: fsdp_xpu
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,281:Registered engines: megatron_xpu
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,306:Registered engines: fsdp_mlu
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,236:Registered engines: megatron_flagos
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,260:Registered engines: fsdp_xpu
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,284:Registered engines: megatron_xpu
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,309:Registered engines: fsdp_mlu
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,251:Registered engines: megatron_flagos
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,275:Registered engines: fsdp_xpu
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,299:Registered engines: megatron_xpu
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,230:Registered engines: megatron_flagos
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,256:Registered engines: fsdp_xpu
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,281:Registered engines: megatron_xpu
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,306:Registered engines: fsdp_mlu
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,234:Registered engines: megatron_flagos
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,258:Registered engines: fsdp_xpu
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,283:Registered engines: megatron_xpu
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,307:Registered engines: fsdp_mlu
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,234:Registered engines: megatron_flagos
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,258:Registered engines: fsdp_xpu
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,283:Registered engines: megatron_xpu
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,307:Registered engines: fsdp_mlu
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,239:Registered engines: megatron_flagos
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,264:Registered engines: fsdp_xpu
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,289:Registered engines: megatron_xpu
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,313:Registered engines: fsdp_mlu
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,228:Registered engines: megatron_flagos
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,255:Registered engines: fsdp_xpu
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,281:Registered engines: megatron_xpu
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,306:Registered engines: fsdp_mlu
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,225:Registered engines: fsdp_flagos
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,249:Registered engines: megatron_flagos
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,274:Registered engines: fsdp_xpu
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,299:Registered engines: megatron_xpu
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,323:Registered engines: fsdp_mlu
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,230:Registered engines: megatron_flagos
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,256:Registered engines: fsdp_xpu
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,281:Registered engines: megatron_xpu
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,306:Registered engines: fsdp_mlu
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,324:Registered engines: fsdp_mlu
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,331:Registered engines: megatron_mlu
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,432:Registered engines: megatron_enflame
�[36m(pid=171896)�[0m INFO:2026-06-23 08:48:10,432:verl-hardware-plugin loaded successfully
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,333:Registered engines: megatron_mlu
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,432:Registered engines: megatron_enflame
�[36m(pid=171897)�[0m INFO:2026-06-23 08:48:10,432:verl-hardware-plugin loaded successfully
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,332:Registered engines: megatron_mlu
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,432:Registered engines: megatron_enflame
�[36m(pid=171895)�[0m INFO:2026-06-23 08:48:10,432:verl-hardware-plugin loaded successfully
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,334:Registered engines: megatron_mlu
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,358:Registered engines: fsdp_metax
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,432:Registered engines: megatron_enflame
�[36m(pid=171891)�[0m INFO:2026-06-23 08:48:10,432:verl-hardware-plugin loaded successfully
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,348:Registered engines: megatron_mlu
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,374:Registered engines: fsdp_metax
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,399:Registered engines: megatron_metax
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,426:Registered engines: fsdp_enflame
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,332:Registered engines: megatron_mlu
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,431:Registered engines: megatron_enflame
�[36m(pid=171898)�[0m INFO:2026-06-23 08:48:10,431:verl-hardware-plugin loaded successfully
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,333:Registered engines: megatron_mlu
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,432:Registered engines: megatron_enflame
�[36m(pid=171892)�[0m INFO:2026-06-23 08:48:10,432:verl-hardware-plugin loaded successfully
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,333:Registered engines: megatron_mlu
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,432:Registered engines: megatron_enflame
�[36m(pid=171893)�[0m INFO:2026-06-23 08:48:10,432:verl-hardware-plugin loaded successfully
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,336:Registered engines: megatron_mlu
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,361:Registered engines: fsdp_metax
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,387:Registered engines: megatron_metax
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,412:Registered engines: fsdp_enflame
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,436:Registered engines: megatron_enflame
�[36m(pid=172375)�[0m INFO:2026-06-23 08:48:10,436:verl-hardware-plugin loaded successfully
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,331:Registered engines: megatron_mlu
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,431:Registered engines: megatron_enflame
�[36m(pid=172376)�[0m INFO:2026-06-23 08:48:10,431:verl-hardware-plugin loaded successfully
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,348:Registered engines: megatron_mlu
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,374:Registered engines: fsdp_metax
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,399:Registered engines: megatron_metax
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,426:Registered engines: fsdp_enflame
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,332:Registered engines: megatron_mlu
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,357:Registered engines: fsdp_metax
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,382:Registered engines: megatron_metax
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,407:Registered engines: fsdp_enflame
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,432:Registered engines: megatron_enflame
�[36m(pid=172378)�[0m INFO:2026-06-23 08:48:10,432:verl-hardware-plugin loaded successfully
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,451:Registered engines: megatron_enflame
�[36m(pid=171894)�[0m INFO:2026-06-23 08:48:10,451:verl-hardware-plugin loaded successfully
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,451:Registered engines: megatron_enflame
�[36m(pid=172377)�[0m INFO:2026-06-23 08:48:10,451:verl-hardware-plugin loaded successfully


@gongxijun

Copy link
Copy Markdown
Author
�[36m(AgentLoopWorker pid=176540)�[0m Using dataset class: RLHFDataset
�[36m(AgentLoopWorker pid=176543)�[0m Using dataset class: RLHFDataset
�[36m(AgentLoopWorker pid=176539)�[0m Using dataset class: RLHFDataset
�[36m(AgentLoopWorker pid=176537)�[0m Using dataset class: RLHFDataset
�[36m(AgentLoopWorker pid=176538)�[0m Using dataset class: RLHFDataset
�[36m(AgentLoopWorker pid=176542)�[0m Using dataset class: RLHFDataset
�[36m(AgentLoopWorker pid=176545)�[0m Using dataset class: RLHFDataset
�[36m(AgentLoopWorker pid=176541)�[0m Using dataset class: RLHFDataset
�[36m(TaskRunner pid=168816)�[0m validation generation end
�[36m(TaskRunner pid=168816)�[0m ("Initial validation metrics: {'val-aux/openai/gsm8k/reward/mean@1': "
�[36m(TaskRunner pid=168816)�[0m  "0.3070507960576194, 'val-core/openai/gsm8k/acc/mean@1': 0.3070507960576194, "
�[36m(TaskRunner pid=168816)�[0m  "'val-aux/num_turns/min': 2, 'val-aux/num_turns/max': 2, "
�[36m(TaskRunner pid=168816)�[0m  "'val-aux/num_turns/mean': 2.0}")
�[36m(TaskRunner pid=168816)�[0m step:0 - val-aux/openai/gsm8k/reward/mean@1:0.3070507960576194 - val-core/openai/gsm8k/acc/mean@1:0.3070507960576194 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0
�[36m(WorkerDict pid=170436)�[0m 
�[36m(WorkerDict pid=170437)�[0m 
�[36m(WorkerDict pid=170434)�[0m 
�[36m(WorkerDict pid=170435)�[0m 
�[36m(WorkerDict pid=170436)�[0m 
�[36m(WorkerDict pid=170437)�[0m 
�[36m(WorkerDict pid=170434)�[0m 
�[36m(WorkerDict pid=170435)�[0m 
�[36m(WorkerDict pid=170437)�[0m 
�[36m(WorkerDict pid=170436)�[0m 
�[36m(WorkerDict pid=170434)�[0m 
�[36m(WorkerDict pid=170435)�[0m 
�[36m(WorkerDict pid=170436)�[0m 
�[36m(WorkerDict pid=170437)�[0m 
�[36m(WorkerDict pid=170434)�[0m 
�[36m(WorkerDict pid=170435)�[0m 
�[36m(WorkerDict pid=170436)�[0m 
�[36m(WorkerDict pid=170437)�[0m 
�[36m(WorkerDict pid=170434)�[0m 
�[36m(WorkerDict pid=170435)�[0m 
�[36m(WorkerDict pid=170436)�[0m 
�[36m(WorkerDict pid=170437)�[0m 
�[36m(WorkerDict pid=170434)�[0m 
�[36m(WorkerDict pid=170435)�[0m 
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:51:28 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:51:28 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:51:28 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:51:28 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m step:1 - global_seqlen/min:63051 - global_seqlen/max:71719 - global_seqlen/minmax_diff:8668 - global_seqlen/balanced_min:67385 - global_seqlen/balanced_max:67386 - global_seqlen/mean:67385.75 - actor/entropy:0.4889897108078003 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.3933042287826538 - training/rollout_probs_diff_mean:0.005766649730503559 - training/rollout_probs_diff_std:0.010471085086464882 - training/rollout_actor_probs_pearson_corr:0.9992210865020752 - rollout_corr/training_ppl:1.6072309017181396 - rollout_corr/training_log_ppl:0.462712824344635 - rollout_corr/kl:0.0007474650628864765 - rollout_corr/k3_kl:0.0007594344788230956 - rollout_corr/rollout_ppl:1.605966329574585 - rollout_corr/rollout_log_ppl:0.46193426847457886 - rollout_corr/log_ppl_diff:0.0007785527850501239 - rollout_corr/log_ppl_abs_diff:0.001300558215007186 - rollout_corr/log_ppl_diff_max:0.006979227066040039 - rollout_corr/log_ppl_diff_min:-0.003345966339111328 - rollout_corr/ppl_ratio:1.0007799863815308 - rollout_corr/chi2_token:0.0015859603881835938 - rollout_corr/chi2_seq:3.8478870391845703 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.03940857567067724 - actor/kl_loss:0.0 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.03940857946872711 - actor/grad_norm:0.398193895816803 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:9.95748233795166 - actor/perf/max_memory_reserved_gb:63.599609375 - actor/perf/cpu_memory_used_gb:245.17737579345703 - perf/mfu/actor:0.0 - training/global_step:1 - training/epoch:0 - critic/score/mean:0.21250000596046448 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.21250000596046448 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.03940857574343681 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.03940857574343681 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:755.6500244140625 - response_length/max:1024.0 - response_length/min:179.0 - response_length/clip_ratio:0.37812501192092896 - response_length_non_aborted/mean:755.6500244140625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:179.0 - response_length_non_aborted/clip_ratio:0.37812501192092896 - response/aborted_ratio:0.0 - prompt_length/mean:86.671875 - prompt_length/max:163.0 - prompt_length/min:53.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:0.00020868800129392184 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.9093668069981504 - timing_s/agent_loop/generate_sequences/max:16.25717139900007 - timing_s/agent_loop/generate_sequences/mean:11.917300356584416 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027659990009851754 - timing_s/agent_loop/compute_score/max:0.027189693002583226 - timing_s/agent_loop/compute_score/mean:0.005035598634378857 - timing_s/agent_loop/slowest/generate_sequences:16.253934618998755 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.01085678700110293 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:88 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.395266917999834 - timing_s/reward:2.407300053164363e-05 - timing_s/old_log_prob:6.7439335550006945 - timing_s/ref:3.3616558169997006 - timing_s/adv:0.016000278999854345 - timing_s/update_actor:10.243421014998603 - timing_s/update_weights:2.0304838019983436 - timing_s/step:38.83326870199744 - timing_s/stop_profile:3.9447000744985417e-05 - timing_per_token_ms/gen:0.0678028308327261 - timing_per_token_ms/ref:0.012471686584328663 - timing_per_token_ms/update_actor:0.038002919812418066 - timing_per_token_ms/adv:5.93607661851888e-05 - perf/total_num_tokens:269543 - perf/time_per_step:38.83326870199744 - perf/throughput:1735.258252842721
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   0%|          | 2/1740 [01:12<17:20:55, 35.94s/it]
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   0%|          | 3/1740 [01:47<16:58:02, 35.17s/it]
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:52:02 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:52:02 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:52:02 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:52:02 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m step:2 - global_seqlen/min:64114 - global_seqlen/max:71507 - global_seqlen/minmax_diff:7393 - global_seqlen/balanced_min:68136 - global_seqlen/balanced_max:68139 - global_seqlen/mean:68137.75 - actor/entropy:0.4584938883781433 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.18687888979911804 - training/rollout_probs_diff_mean:0.0055571491830050945 - training/rollout_probs_diff_std:0.01030874252319336 - training/rollout_actor_probs_pearson_corr:0.9992917776107788 - rollout_corr/training_ppl:1.5591614246368408 - rollout_corr/training_log_ppl:0.43473583459854126 - rollout_corr/kl:0.0007253456860780716 - rollout_corr/k3_kl:0.0007104825344868004 - rollout_corr/rollout_ppl:1.5579941272735596 - rollout_corr/rollout_log_ppl:0.43398362398147583 - rollout_corr/log_ppl_diff:0.0007522336672991514 - rollout_corr/log_ppl_abs_diff:0.0011922449339181185 - rollout_corr/log_ppl_diff_max:0.004928380250930786 - rollout_corr/log_ppl_diff_min:-0.0038859546184539795 - rollout_corr/ppl_ratio:1.000753402709961 - rollout_corr/chi2_token:0.0013997554779052734 - rollout_corr/chi2_seq:1.0681242942810059 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.04183923451273586 - actor/kl_loss:0.0006537040105740743 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.04183989018201828 - actor/grad_norm:0.334147185087204 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.350535869598389 - actor/perf/max_memory_reserved_gb:64.83203125 - actor/perf/cpu_memory_used_gb:245.18866348266602 - perf/mfu/actor:0.0 - training/global_step:2 - training/epoch:0 - critic/score/mean:0.21562500298023224 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.21562500298023224 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04183923080563545 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.04183923080563545 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:769.0187377929688 - response_length/max:1024.0 - response_length/min:243.0 - response_length/clip_ratio:0.40312498807907104 - response_length_non_aborted/mean:769.0187377929688 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:243.0 - response_length_non_aborted/clip_ratio:0.40312498807907104 - response/aborted_ratio:0.0 - prompt_length/mean:82.703125 - prompt_length/max:162.0 - prompt_length/min:55.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.63719989056699e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.887915391998831 - timing_s/agent_loop/generate_sequences/max:16.3203601430032 - timing_s/agent_loop/generate_sequences/mean:12.091281647743722 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027879609988303855 - timing_s/agent_loop/compute_score/max:0.016882853000424802 - timing_s/agent_loop/compute_score/mean:0.004583399821865441 - timing_s/agent_loop/slowest/generate_sequences:16.3203601430032 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003169589999743039 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:106 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.434708601002058 - timing_s/reward:1.542999962111935e-05 - timing_s/old_log_prob:2.9756809059981606 - timing_s/ref:2.734038849001081 - timing_s/adv:0.013889093999750912 - timing_s/update_actor:9.499376952000603 - timing_s/update_weights:2.0931781939980283 - timing_s/step:33.77874364499803 - timing_s/stop_profile:4.04050006181933e-05 - timing_per_token_ms/gen:0.06678441114489267 - timing_per_token_ms/ref:0.010031292671834193 - timing_per_token_ms/update_actor:0.03485357585186113 - timing_per_token_ms/adv:5.095961489684834e-05 - perf/total_num_tokens:272551 - perf/time_per_step:33.77874364499803 - perf/throughput:2017.1783390200146
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:52:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:52:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:52:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:52:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   0%|          | 4/1740 [02:21<16:54:01, 35.05s/it]
�[36m(TaskRunner pid=168816)�[0m step:3 - global_seqlen/min:59856 - global_seqlen/max:68105 - global_seqlen/minmax_diff:8249 - global_seqlen/balanced_min:64554 - global_seqlen/balanced_max:64555 - global_seqlen/mean:64554.5 - actor/entropy:0.46803849935531616 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.2589915990829468 - training/rollout_probs_diff_mean:0.005720236804336309 - training/rollout_probs_diff_std:0.010552444495260715 - training/rollout_actor_probs_pearson_corr:0.9992521405220032 - rollout_corr/training_ppl:1.5703747272491455 - rollout_corr/training_log_ppl:0.44161930680274963 - rollout_corr/kl:0.0007170534809119999 - rollout_corr/k3_kl:0.00074957957258448 - rollout_corr/rollout_ppl:1.569217324256897 - rollout_corr/rollout_log_ppl:0.44090285897254944 - rollout_corr/log_ppl_diff:0.0007164609269239008 - rollout_corr/log_ppl_abs_diff:0.0013619984965771437 - rollout_corr/log_ppl_diff_max:0.00587540864944458 - rollout_corr/log_ppl_diff_min:-0.006187856197357178 - rollout_corr/ppl_ratio:1.0007178783416748 - rollout_corr/chi2_token:0.0015749931335449219 - rollout_corr/chi2_seq:2.963491916656494 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.026926075895971735 - actor/kl_loss:0.0007013690451458388 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.02692677639424801 - actor/grad_norm:0.38825637102127075 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.38784646987915 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.16300582885742 - perf/mfu/actor:0.0 - training/global_step:3 - training/epoch:0 - critic/score/mean:0.2593750059604645 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.2593750059604645 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.02692607045173645 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.02692607045173645 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:725.4937744140625 - response_length/max:1024.0 - response_length/min:226.0 - response_length/clip_ratio:0.36250001192092896 - response_length_non_aborted/mean:725.4937744140625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:226.0 - response_length_non_aborted/clip_ratio:0.36250001192092896 - response/aborted_ratio:0.0 - prompt_length/mean:81.4375 - prompt_length/max:181.0 - prompt_length/min:52.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.9700000090524554e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.6150009769989992 - timing_s/agent_loop/generate_sequences/max:17.016768083998613 - timing_s/agent_loop/generate_sequences/mean:11.467853752059408 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0026959999995597173 - timing_s/agent_loop/compute_score/max:0.013897700999223161 - timing_s/agent_loop/compute_score/mean:0.0046415969625400065 - timing_s/agent_loop/slowest/generate_sequences:17.016768083998613 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003372287999809487 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:83 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:17.128874473997712 - timing_s/reward:1.3812001270707697e-05 - timing_s/old_log_prob:2.9187745509989327 - timing_s/ref:2.6711874310021813 - timing_s/adv:0.012504370999522507 - timing_s/update_actor:9.327601043998584 - timing_s/update_weights:2.1560128390010505 - timing_s/step:34.243371418000606 - timing_s/stop_profile:5.278800017549656e-05 - timing_per_token_ms/gen:0.07378110801263671 - timing_per_token_ms/ref:0.010344698785530758 - timing_per_token_ms/update_actor:0.03612296990914105 - timing_per_token_ms/adv:4.84256364758557e-05 - perf/total_num_tokens:258218 - perf/time_per_step:34.243371418000606 - perf/throughput:1885.1677661057006
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:53:11 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:53:11 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:53:11 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:53:11 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   0%|          | 5/1740 [03:27<22:07:44, 45.92s/it]
�[36m(TaskRunner pid=168816)�[0m step:4 - global_seqlen/min:63556 - global_seqlen/max:70224 - global_seqlen/minmax_diff:6668 - global_seqlen/balanced_min:66942 - global_seqlen/balanced_max:66947 - global_seqlen/mean:66943.5 - actor/entropy:0.4902154207229614 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.30109703540802 - training/rollout_probs_diff_mean:0.0057935165241360664 - training/rollout_probs_diff_std:0.01048018503934145 - training/rollout_actor_probs_pearson_corr:0.9992663264274597 - rollout_corr/training_ppl:1.598647117614746 - rollout_corr/training_log_ppl:0.45819202065467834 - rollout_corr/kl:0.0006666513509117067 - rollout_corr/k3_kl:0.0007607945008203387 - rollout_corr/rollout_ppl:1.5975453853607178 - rollout_corr/rollout_log_ppl:0.4575042724609375 - rollout_corr/log_ppl_diff:0.0006876903935335577 - rollout_corr/log_ppl_abs_diff:0.0013383477926254272 - rollout_corr/log_ppl_diff_max:0.004501402378082275 - rollout_corr/log_ppl_diff_min:-0.006224274635314941 - rollout_corr/ppl_ratio:1.000689148902893 - rollout_corr/chi2_token:0.0017151832580566406 - rollout_corr/chi2_seq:2.2878284454345703 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.02125293153221719 - actor/kl_loss:0.0008155809907748335 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.02125374786555767 - actor/grad_norm:0.37522414326667786 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.1883087158203 - perf/mfu/actor:0.0 - training/global_step:4 - training/epoch:0 - critic/score/mean:0.28125 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.28125 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.021252932026982307 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.021252932026982307 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:745.1687622070312 - response_length/max:1024.0 - response_length/min:248.0 - response_length/clip_ratio:0.359375 - response_length_non_aborted/mean:745.1687622070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:248.0 - response_length_non_aborted/clip_ratio:0.359375 - response/aborted_ratio:0.0 - prompt_length/mean:91.625 - prompt_length/max:201.0 - prompt_length/min:52.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.272499765851535e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.951559258999623 - timing_s/agent_loop/generate_sequences/max:17.652287665001495 - timing_s/agent_loop/generate_sequences/mean:11.921665323253047 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027221429991186596 - timing_s/agent_loop/compute_score/max:0.0224187140011054 - timing_s/agent_loop/compute_score/mean:0.004713755993850555 - timing_s/agent_loop/slowest/generate_sequences:17.652287665001495 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0031583210002281703 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:152 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:17.766643165999994 - timing_s/reward:1.7399001080775633e-05 - timing_s/old_log_prob:2.9458179009998275 - timing_s/ref:2.6948801270009426 - timing_s/adv:0.01248715699694003 - timing_s/update_actor:9.344790188999468 - timing_s/update_weights:2.0685312889982015 - timing_s/step:34.860568631000206 - timing_s/stop_profile:3.6181001632940024e-05 - timing_per_token_ms/gen:0.074507633195501 - timing_per_token_ms/ref:0.010064009676073639 - timing_per_token_ms/update_actor:0.03489804906002625 - timing_per_token_ms/adv:4.663319439878416e-05 - perf/total_num_tokens:267774 - perf/time_per_step:34.860568631000206 - perf/throughput:1920.321515939635
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:53:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:53:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:53:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:53:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True, 'global_steps': 5}
�[36m(TaskRunner pid=168816)�[0m validation generation end
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   0%|          | 6/1740 [04:00<20:05:19, 41.71s/it]
�[36m(TaskRunner pid=168816)�[0m step:5 - global_seqlen/min:59788 - global_seqlen/max:71266 - global_seqlen/minmax_diff:11478 - global_seqlen/balanced_min:65235 - global_seqlen/balanced_max:65239 - global_seqlen/mean:65237.0 - actor/entropy:0.47839999198913574 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.32826268672943115 - training/rollout_probs_diff_mean:0.005746937356889248 - training/rollout_probs_diff_std:0.010497118346393108 - training/rollout_actor_probs_pearson_corr:0.9992486834526062 - rollout_corr/training_ppl:1.5879628658294678 - rollout_corr/training_log_ppl:0.4524807929992676 - rollout_corr/kl:0.000686793529894203 - rollout_corr/k3_kl:0.0007509776623919606 - rollout_corr/rollout_ppl:1.586805820465088 - rollout_corr/rollout_log_ppl:0.45177680253982544 - rollout_corr/log_ppl_diff:0.0007040216587483883 - rollout_corr/log_ppl_abs_diff:0.0013215722283348441 - rollout_corr/log_ppl_diff_max:0.005447506904602051 - rollout_corr/log_ppl_diff_min:-0.0037515759468078613 - rollout_corr/ppl_ratio:1.0007054805755615 - rollout_corr/chi2_token:0.001653432846069336 - rollout_corr/chi2_seq:1.2716131210327148 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.024799662620353047 - actor/kl_loss:0.0009008552988234442 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.024800561368465424 - actor/grad_norm:0.34697312116622925 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.14764404296875 - perf/mfu/actor:0.0 - val-aux/openai/gsm8k/reward/mean@1:0.3646702047005307 - val-core/openai/gsm8k/acc/mean@1:0.3646702047005307 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0 - training/global_step:5 - training/epoch:0 - critic/score/mean:0.29374998807907104 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.29374998807907104 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.024799663573503494 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.024799663573503494 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:732.1031494140625 - response_length/max:1024.0 - response_length/min:197.0 - response_length/clip_ratio:0.34687501192092896 - response_length_non_aborted/mean:732.1031494140625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:197.0 - response_length_non_aborted/clip_ratio:0.34687501192092896 - response/aborted_ratio:0.0 - prompt_length/mean:83.359375 - prompt_length/max:158.0 - prompt_length/min:52.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.564999835565686e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.167036928000016 - timing_s/agent_loop/generate_sequences/max:16.29063697500169 - timing_s/agent_loop/generate_sequences/mean:11.439248832515625 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0026915329981420655 - timing_s/agent_loop/compute_score/max:0.03656841600241023 - timing_s/agent_loop/compute_score/mean:0.005064761112453198 - timing_s/agent_loop/slowest/generate_sequences:16.29063697500169 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003513586998451501 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:127 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:17.17750136900213 - timing_s/reward:1.5540997992502525e-05 - timing_s/old_log_prob:2.9278272810006456 - timing_s/ref:3.4965561459976016 - timing_s/adv:0.01293163199807168 - timing_s/update_actor:9.289440725999157 - timing_s/update_weights:2.0783793080008763 - timing_s/step:35.01038901799984 - timing_s/testing:30.171545131997846 - timing_s/stop_profile:8.422599785262719e-05 - timing_per_token_ms/gen:0.07332258249564452 - timing_per_token_ms/ref:0.013399436462427769 - timing_per_token_ms/update_actor:0.03559881940462911 - timing_per_token_ms/adv:4.955635604822294e-05 - perf/total_num_tokens:260948 - perf/time_per_step:35.01038901799984 - perf/throughput:1863.3611859171228
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:54:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:54:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:54:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:54:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   0%|          | 7/1740 [04:33<18:44:04, 38.92s/it]
�[36m(TaskRunner pid=168816)�[0m step:6 - global_seqlen/min:60188 - global_seqlen/max:73847 - global_seqlen/minmax_diff:13659 - global_seqlen/balanced_min:67233 - global_seqlen/balanced_max:67236 - global_seqlen/mean:67234.5 - actor/entropy:0.498772531747818 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.3703936040401459 - training/rollout_probs_diff_mean:0.005872943438589573 - training/rollout_probs_diff_std:0.010533497668802738 - training/rollout_actor_probs_pearson_corr:0.9992924332618713 - rollout_corr/training_ppl:1.610360860824585 - rollout_corr/training_log_ppl:0.4659952223300934 - rollout_corr/kl:0.0006905316840857267 - rollout_corr/k3_kl:0.0007728732889518142 - rollout_corr/rollout_ppl:1.6092283725738525 - rollout_corr/rollout_log_ppl:0.4652949869632721 - rollout_corr/log_ppl_diff:0.0007002384518273175 - rollout_corr/log_ppl_abs_diff:0.0013785857008770108 - rollout_corr/log_ppl_diff_max:0.005278050899505615 - rollout_corr/log_ppl_diff_min:-0.004625797271728516 - rollout_corr/ppl_ratio:1.0007017850875854 - rollout_corr/chi2_token:0.0017273426055908203 - rollout_corr/chi2_seq:6.297795295715332 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.03640978061387662 - actor/kl_loss:0.0011628120519162621 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.036410942673683167 - actor/grad_norm:0.3872873783111572 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.3689308166504 - perf/mfu/actor:0.0 - training/global_step:6 - training/epoch:0 - critic/score/mean:0.3187499940395355 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.3187499940395355 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.03640977293252945 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.03640977293252945 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:755.1031494140625 - response_length/max:1024.0 - response_length/min:194.0 - response_length/clip_ratio:0.39375001192092896 - response_length_non_aborted/mean:755.1031494140625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:194.0 - response_length_non_aborted/clip_ratio:0.39375001192092896 - response/aborted_ratio:0.0 - prompt_length/mean:85.328125 - prompt_length/max:180.0 - prompt_length/min:44.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.83839976368472e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.1320026080029493 - timing_s/agent_loop/generate_sequences/max:16.14215050799976 - timing_s/agent_loop/generate_sequences/mean:11.899879527453084 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002829618999385275 - timing_s/agent_loop/compute_score/max:0.018797126002027653 - timing_s/agent_loop/compute_score/mean:0.004786367431245253 - timing_s/agent_loop/slowest/generate_sequences:16.14215050799976 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003322707001643721 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:112 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.283830902000773 - timing_s/reward:1.4116998499957845e-05 - timing_s/old_log_prob:2.965779376001592 - timing_s/ref:2.708921872999781 - timing_s/adv:0.01679790400157799 - timing_s/update_actor:9.41649524499735 - timing_s/update_weights:2.1071457420002844 - timing_s/step:33.528658192000876 - timing_s/stop_profile:3.5570999898482114e-05 - timing_per_token_ms/gen:0.06739075747932101 - timing_per_token_ms/ref:0.010072663115661531 - timing_per_token_ms/update_actor:0.03501362858724818 - timing_per_token_ms/adv:6.246013579924737e-05 - perf/total_num_tokens:268938 - perf/time_per_step:33.528658192000876 - perf/throughput:2005.284542404996
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:55:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:55:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:55:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:55:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   0%|          | 8/1740 [05:06<17:50:26, 37.08s/it]
�[36m(TaskRunner pid=168816)�[0m step:7 - global_seqlen/min:55226 - global_seqlen/max:67584 - global_seqlen/minmax_diff:12358 - global_seqlen/balanced_min:63207 - global_seqlen/balanced_max:63209 - global_seqlen/mean:63208.0 - actor/entropy:0.4789058566093445 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.2051597237586975 - training/rollout_probs_diff_mean:0.005755679681897163 - training/rollout_probs_diff_std:0.010506547056138515 - training/rollout_actor_probs_pearson_corr:0.9992757439613342 - rollout_corr/training_ppl:1.5799479484558105 - rollout_corr/training_log_ppl:0.44747018814086914 - rollout_corr/kl:0.0007533251773566008 - rollout_corr/k3_kl:0.0007447492680512369 - rollout_corr/rollout_ppl:1.5787256956100464 - rollout_corr/rollout_log_ppl:0.4467264711856842 - rollout_corr/log_ppl_diff:0.000743693788535893 - rollout_corr/log_ppl_abs_diff:0.0013808731455355883 - rollout_corr/log_ppl_diff_max:0.005732059478759766 - rollout_corr/log_ppl_diff_min:-0.00479501485824585 - rollout_corr/ppl_ratio:1.000745177268982 - rollout_corr/chi2_token:0.0014829635620117188 - rollout_corr/chi2_seq:1.4286482334136963 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.044735518521520135 - actor/kl_loss:0.0014791195071666152 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.04473699629306793 - actor/grad_norm:0.4599803686141968 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.35385513305664 - perf/mfu/actor:0.0 - training/global_step:7 - training/epoch:0 - critic/score/mean:0.3687500059604645 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.3687500059604645 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.044735513627529144 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.044735513627529144 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:709.2249755859375 - response_length/max:1024.0 - response_length/min:205.0 - response_length/clip_ratio:0.328125 - response_length_non_aborted/mean:709.2249755859375 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:205.0 - response_length_non_aborted/clip_ratio:0.328125 - response/aborted_ratio:0.0 - prompt_length/mean:80.875 - prompt_length/max:149.0 - prompt_length/min:52.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.0678001571213827e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.3142197960005433 - timing_s/agent_loop/generate_sequences/max:16.17259784899943 - timing_s/agent_loop/generate_sequences/mean:11.180543861962372 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027821990006486885 - timing_s/agent_loop/compute_score/max:0.02510412799892947 - timing_s/agent_loop/compute_score/mean:0.004707435531281589 - timing_s/agent_loop/slowest/generate_sequences:16.156657450999774 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.02128452000033576 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:105 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.288963410002907 - timing_s/reward:1.3810000382363796e-05 - timing_s/old_log_prob:2.9100141620001523 - timing_s/ref:2.6530307559987705 - timing_s/adv:0.012815283000236377 - timing_s/update_actor:9.249542518002272 - timing_s/update_weights:2.0276272629998857 - timing_s/step:33.170513887002016 - timing_s/stop_profile:3.664000178105198e-05 - timing_per_token_ms/gen:0.07177272467307143 - timing_per_token_ms/ref:0.010493255426523424 - timing_per_token_ms/update_actor:0.03658374935926731 - timing_per_token_ms/adv:5.0686950228754184e-05 - perf/total_num_tokens:252832 - perf/time_per_step:33.170513887002016 - perf/throughput:1905.5478071676266
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:55:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:55:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:55:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:55:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 9/1740 [05:39<17:12:12, 35.78s/it]
�[36m(TaskRunner pid=168816)�[0m step:8 - global_seqlen/min:60572 - global_seqlen/max:66804 - global_seqlen/minmax_diff:6232 - global_seqlen/balanced_min:62825 - global_seqlen/balanced_max:62828 - global_seqlen/mean:62826.5 - actor/entropy:0.4728289246559143 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.29273033142089844 - training/rollout_probs_diff_mean:0.005711773410439491 - training/rollout_probs_diff_std:0.010544096119701862 - training/rollout_actor_probs_pearson_corr:0.9992071986198425 - rollout_corr/training_ppl:1.570755958557129 - rollout_corr/training_log_ppl:0.4416949152946472 - rollout_corr/kl:0.0007316319388337433 - rollout_corr/k3_kl:0.0007368561928160489 - rollout_corr/rollout_ppl:1.5695925951004028 - rollout_corr/rollout_log_ppl:0.440969854593277 - rollout_corr/log_ppl_diff:0.000725047430023551 - rollout_corr/log_ppl_abs_diff:0.00136377546004951 - rollout_corr/log_ppl_diff_max:0.006784558296203613 - rollout_corr/log_ppl_diff_min:-0.004989206790924072 - rollout_corr/ppl_ratio:1.000726580619812 - rollout_corr/chi2_token:0.001499176025390625 - rollout_corr/chi2_seq:1.0739326477050781 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.03568632410315331 - actor/kl_loss:0.0016770079937487026 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.03568800166249275 - actor/grad_norm:0.47564736008644104 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.2659454345703 - perf/mfu/actor:0.0 - training/global_step:8 - training/epoch:0 - critic/score/mean:0.35624998807907104 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.35624998807907104 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.03568631410598755 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.03568631410598755 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:700.8937377929688 - response_length/max:1024.0 - response_length/min:217.0 - response_length/clip_ratio:0.3031249940395355 - response_length_non_aborted/mean:700.8937377929688 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:217.0 - response_length_non_aborted/clip_ratio:0.3031249940395355 - response/aborted_ratio:0.0 - prompt_length/mean:84.4375 - prompt_length/max:126.0 - prompt_length/min:51.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.920900028082542e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.5089501779984857 - timing_s/agent_loop/generate_sequences/max:16.190395939000155 - timing_s/agent_loop/generate_sequences/mean:11.079467376184345 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002693116999580525 - timing_s/agent_loop/compute_score/max:0.012046339001244633 - timing_s/agent_loop/compute_score/mean:0.004033403606376851 - timing_s/agent_loop/slowest/generate_sequences:16.190395939000155 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.004332328000600683 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:57 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.32138675600072 - timing_s/reward:1.5140001778490841e-05 - timing_s/old_log_prob:2.8856569599993236 - timing_s/ref:2.647963534000155 - timing_s/adv:0.012432572999387048 - timing_s/update_actor:9.185476196998934 - timing_s/update_weights:2.0663246660005825 - timing_s/step:33.14647389400125 - timing_s/stop_profile:3.7707999581471086e-05 - timing_per_token_ms/gen:0.07277042149755544 - timing_per_token_ms/ref:0.010536809841389203 - timing_per_token_ms/update_actor:0.03655096255958447 - timing_per_token_ms/adv:4.947185104767514e-05 - perf/total_num_tokens:251306 - perf/time_per_step:33.14647389400125 - perf/throughput:1895.420315322595
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:56:29 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:56:29 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:56:29 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:56:29 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 10/1740 [06:42<21:08:37, 44.00s/it]
�[36m(TaskRunner pid=168816)�[0m step:9 - global_seqlen/min:55144 - global_seqlen/max:63332 - global_seqlen/minmax_diff:8188 - global_seqlen/balanced_min:60064 - global_seqlen/balanced_max:60067 - global_seqlen/mean:60065.0 - actor/entropy:0.48951256275177 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.22772589325904846 - training/rollout_probs_diff_mean:0.005873494781553745 - training/rollout_probs_diff_std:0.010659972205758095 - training/rollout_actor_probs_pearson_corr:0.9992526173591614 - rollout_corr/training_ppl:1.5894925594329834 - rollout_corr/training_log_ppl:0.45413699746131897 - rollout_corr/kl:0.0006999184261076152 - rollout_corr/k3_kl:0.0007744614849798381 - rollout_corr/rollout_ppl:1.5883052349090576 - rollout_corr/rollout_log_ppl:0.45340561866760254 - rollout_corr/log_ppl_diff:0.0007313207606784999 - rollout_corr/log_ppl_abs_diff:0.001448549795895815 - rollout_corr/log_ppl_diff_max:0.008101493120193481 - rollout_corr/log_ppl_diff_min:-0.0042979419231414795 - rollout_corr/ppl_ratio:1.0007331371307373 - rollout_corr/chi2_token:0.0017205476760864258 - rollout_corr/chi2_seq:1.433769941329956 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.039235089849171345 - actor/kl_loss:0.0019165423509548418 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.03923700749874115 - actor/grad_norm:0.4183718264102936 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.26667022705078 - perf/mfu/actor:0.0 - training/global_step:9 - training/epoch:0 - critic/score/mean:0.4000000059604645 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.4000000059604645 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.039235085248947144 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.039235085248947144 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:667.859375 - response_length/max:1024.0 - response_length/min:196.0 - response_length/clip_ratio:0.28437501192092896 - response_length_non_aborted/mean:667.859375 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:196.0 - response_length_non_aborted/clip_ratio:0.28437501192092896 - response/aborted_ratio:0.0 - prompt_length/mean:82.953125 - prompt_length/max:156.0 - prompt_length/min:54.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.8396999773103744e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.1530595349977375 - timing_s/agent_loop/generate_sequences/max:16.192726231998677 - timing_s/agent_loop/generate_sequences/mean:10.472691333149964 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027187500018044375 - timing_s/agent_loop/compute_score/max:0.010223240999039263 - timing_s/agent_loop/compute_score/mean:0.0037991497530697415 - timing_s/agent_loop/slowest/generate_sequences:16.192726231998677 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003766535002796445 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:73 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.304120329998113 - timing_s/reward:1.5076999261509627e-05 - timing_s/old_log_prob:2.8286768139987544 - timing_s/ref:2.609498526999232 - timing_s/adv:0.012564754000777612 - timing_s/update_actor:9.086724828001024 - timing_s/update_weights:2.0293488959978276 - timing_s/step:32.89812787500341 - timing_s/stop_profile:3.841099896817468e-05 - timing_per_token_ms/gen:0.0762890781180456 - timing_per_token_ms/ref:0.01086114428951649 - timing_per_token_ms/update_actor:0.03782038137018657 - timing_per_token_ms/adv:5.2296487142169365e-05 - perf/total_num_tokens:240260 - perf/time_per_step:32.89812787500341 - perf/throughput:1825.7877842841772
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:57:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:57:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:57:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:57:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True, 'global_steps': 10}
�[36m(TaskRunner pid=168816)�[0m validation generation end
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 11/1740 [07:15<19:32:18, 40.68s/it]
�[36m(TaskRunner pid=168816)�[0m step:10 - global_seqlen/min:60337 - global_seqlen/max:66597 - global_seqlen/minmax_diff:6260 - global_seqlen/balanced_min:63654 - global_seqlen/balanced_max:63656 - global_seqlen/mean:63655.25 - actor/entropy:0.49275335669517517 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.23070886731147766 - training/rollout_probs_diff_mean:0.005758569575846195 - training/rollout_probs_diff_std:0.0104379178956151 - training/rollout_actor_probs_pearson_corr:0.9992914199829102 - rollout_corr/training_ppl:1.593087911605835 - rollout_corr/training_log_ppl:0.45167645812034607 - rollout_corr/kl:0.000772513507399708 - rollout_corr/k3_kl:0.0007334441761486232 - rollout_corr/rollout_ppl:1.5918786525726318 - rollout_corr/rollout_log_ppl:0.4509614408016205 - rollout_corr/log_ppl_diff:0.0007150179008021951 - rollout_corr/log_ppl_abs_diff:0.0013829529052600265 - rollout_corr/log_ppl_diff_max:0.00607144832611084 - rollout_corr/log_ppl_diff_min:-0.0040104687213897705 - rollout_corr/ppl_ratio:1.0007165670394897 - rollout_corr/chi2_token:0.0013957023620605469 - rollout_corr/chi2_seq:1.3322224617004395 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.03582743586230208 - actor/kl_loss:0.0021654770644090604 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.03582959622144699 - actor/grad_norm:0.4269242584705353 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.27116775512695 - perf/mfu/actor:0.0 - val-aux/openai/gsm8k/reward/mean@1:0.4700530705079606 - val-core/openai/gsm8k/acc/mean@1:0.4700530705079606 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0 - training/global_step:10 - training/epoch:0 - critic/score/mean:0.4625000059604645 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.4625000059604645 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.03582743927836418 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.03582743927836418 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:713.8468627929688 - response_length/max:1024.0 - response_length/min:220.0 - response_length/clip_ratio:0.3343749940395355 - response_length_non_aborted/mean:713.8468627929688 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:220.0 - response_length_non_aborted/clip_ratio:0.3343749940395355 - response/aborted_ratio:0.0 - prompt_length/mean:81.84375 - prompt_length/max:147.0 - prompt_length/min:48.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.2016002882737666e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.5688114799995674 - timing_s/agent_loop/generate_sequences/max:16.245618119999563 - timing_s/agent_loop/generate_sequences/mean:11.239319488290686 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002684494000277482 - timing_s/agent_loop/compute_score/max:0.014434424996579764 - timing_s/agent_loop/compute_score/mean:0.00430322568442989 - timing_s/agent_loop/slowest/generate_sequences:16.245618119999563 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003091100003075553 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:78 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.3744594100026 - timing_s/reward:2.3650998627999797e-05 - timing_s/old_log_prob:3.7081985289987642 - timing_s/ref:2.66145841999969 - timing_s/adv:0.012486487998103257 - timing_s/update_actor:9.30443403100071 - timing_s/update_weights:2.0321168189984746 - timing_s/step:34.12139413600016 - timing_s/testing:28.278870336002 - timing_s/stop_profile:7.832899791537784e-05 - timing_per_token_ms/gen:0.07168229973165902 - timing_per_token_ms/ref:0.010452627316677296 - timing_per_token_ms/update_actor:0.036542288464033644 - timing_per_token_ms/adv:4.9039505767800995e-05 - perf/total_num_tokens:254621 - perf/time_per_step:34.12139413600016 - perf/throughput:1865.5524374615106
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:58:05 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:58:05 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:58:05 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:58:05 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 12/1740 [07:48<18:23:42, 38.32s/it]
�[36m(TaskRunner pid=168816)�[0m step:11 - global_seqlen/min:58380 - global_seqlen/max:64665 - global_seqlen/minmax_diff:6285 - global_seqlen/balanced_min:62097 - global_seqlen/balanced_max:62101 - global_seqlen/mean:62098.75 - actor/entropy:0.5166388750076294 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.27183669805526733 - training/rollout_probs_diff_mean:0.005929682403802872 - training/rollout_probs_diff_std:0.01053947675973177 - training/rollout_actor_probs_pearson_corr:0.999276340007782 - rollout_corr/training_ppl:1.623020887374878 - rollout_corr/training_log_ppl:0.4707392156124115 - rollout_corr/kl:0.0006486983620561659 - rollout_corr/k3_kl:0.0007833117851987481 - rollout_corr/rollout_ppl:1.6220163106918335 - rollout_corr/rollout_log_ppl:0.470142662525177 - rollout_corr/log_ppl_diff:0.0005965917953290045 - rollout_corr/log_ppl_abs_diff:0.0013177311047911644 - rollout_corr/log_ppl_diff_max:0.00597342848777771 - rollout_corr/log_ppl_diff_min:-0.0039383769035339355 - rollout_corr/ppl_ratio:1.0005980730056763 - rollout_corr/chi2_token:0.001856088638305664 - rollout_corr/chi2_seq:1.5700435638427734 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.06469402613583952 - actor/kl_loss:0.0028173984082968673 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.06469684094190598 - actor/grad_norm:0.40046408772468567 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.45600509643555 - perf/mfu/actor:0.0 - training/global_step:11 - training/epoch:0 - critic/score/mean:0.453125 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.453125 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.06469402462244034 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.06469402462244034 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:695.15625 - response_length/max:1024.0 - response_length/min:168.0 - response_length/clip_ratio:0.328125 - response_length_non_aborted/mean:695.15625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:168.0 - response_length_non_aborted/clip_ratio:0.328125 - response/aborted_ratio:0.0 - prompt_length/mean:81.078125 - prompt_length/max:137.0 - prompt_length/min:48.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.280700159142725e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.7350375139976677 - timing_s/agent_loop/generate_sequences/max:16.12717527299901 - timing_s/agent_loop/generate_sequences/mean:10.89611090364051 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002571662000264041 - timing_s/agent_loop/compute_score/max:0.02821069399942644 - timing_s/agent_loop/compute_score/mean:0.004860748246915137 - timing_s/agent_loop/slowest/generate_sequences:16.12717527299901 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0042253409992554225 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:70 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.284176411998487 - timing_s/reward:1.6794001567177474e-05 - timing_s/old_log_prob:2.8947319360013353 - timing_s/ref:2.6870694829995045 - timing_s/adv:0.013604301999293966 - timing_s/update_actor:9.193734050000785 - timing_s/update_weights:2.0528029289998813 - timing_s/step:33.15491325999756 - timing_s/stop_profile:5.1587998314062133e-05 - timing_per_token_ms/gen:0.0732037599999932 - timing_per_token_ms/ref:0.010817727744115237 - timing_per_token_ms/update_actor:0.03701255681475386 - timing_per_token_ms/adv:5.476882384626891e-05 - perf/total_num_tokens:248395 - perf/time_per_step:33.15491325999756 - perf/throughput:1872.987858934443
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:58:38 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:58:38 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:58:38 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:58:38 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 13/1740 [08:21<17:35:04, 36.66s/it]
�[36m(TaskRunner pid=168816)�[0m step:12 - global_seqlen/min:60169 - global_seqlen/max:65179 - global_seqlen/minmax_diff:5010 - global_seqlen/balanced_min:62758 - global_seqlen/balanced_max:62761 - global_seqlen/mean:62759.75 - actor/entropy:0.48622676730155945 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.23676803708076477 - training/rollout_probs_diff_mean:0.005723754875361919 - training/rollout_probs_diff_std:0.010485450737178326 - training/rollout_actor_probs_pearson_corr:0.9992698431015015 - rollout_corr/training_ppl:1.5972082614898682 - rollout_corr/training_log_ppl:0.45712488889694214 - rollout_corr/kl:0.0007129263831302524 - rollout_corr/k3_kl:0.0007378763984888792 - rollout_corr/rollout_ppl:1.5960766077041626 - rollout_corr/rollout_log_ppl:0.45641255378723145 - rollout_corr/log_ppl_diff:0.0007123060058802366 - rollout_corr/log_ppl_abs_diff:0.001416441984474659 - rollout_corr/log_ppl_diff_max:0.006478309631347656 - rollout_corr/log_ppl_diff_min:-0.00805586576461792 - rollout_corr/ppl_ratio:1.0007140636444092 - rollout_corr/chi2_token:0.0015299320220947266 - rollout_corr/chi2_seq:3.408018112182617 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.05639507524028886 - actor/kl_loss:0.0033913784918695455 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.05639846622943878 - actor/grad_norm:0.43239864706993103 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.4695816040039 - perf/mfu/actor:0.0 - training/global_step:12 - training/epoch:0 - critic/score/mean:0.4906249940395355 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.4906249940395355 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.05639507621526718 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.05639507621526718 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:699.5281372070312 - response_length/max:1024.0 - response_length/min:213.0 - response_length/clip_ratio:0.28437501192092896 - response_length_non_aborted/mean:699.5281372070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:213.0 - response_length_non_aborted/clip_ratio:0.28437501192092896 - response/aborted_ratio:0.0 - prompt_length/mean:84.96875 - prompt_length/max:157.0 - prompt_length/min:49.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.472499833558686e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.3754896860009467 - timing_s/agent_loop/generate_sequences/max:16.094577839998237 - timing_s/agent_loop/generate_sequences/mean:10.956283456137589 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0028365819998725783 - timing_s/agent_loop/compute_score/max:0.015213960999972187 - timing_s/agent_loop/compute_score/mean:0.004081486431095982 - timing_s/agent_loop/slowest/generate_sequences:16.094577839998237 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.004563706002954859 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:66 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.211642704998667 - timing_s/reward:2.0580999262165278e-05 - timing_s/old_log_prob:2.8802218179989723 - timing_s/ref:2.653243536999071 - timing_s/adv:0.012580681999679655 - timing_s/update_actor:9.16039946699675 - timing_s/update_weights:1.9755740540022089 - timing_s/step:32.92176387500149 - timing_s/stop_profile:3.8490001315949485e-05 - timing_per_token_ms/gen:0.0724222252723875 - timing_per_token_ms/ref:0.010569049179605841 - timing_per_token_ms/update_actor:0.0364899456538496 - timing_per_token_ms/adv:5.011445233481513e-05 - perf/total_num_tokens:251039 - perf/time_per_step:32.92176387500149 - perf/throughput:1906.330117617283
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:59:10 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:59:10 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:59:10 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:59:10 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 14/1740 [08:54<17:09:08, 35.78s/it]
�[36m(TaskRunner pid=168816)�[0m step:13 - global_seqlen/min:53671 - global_seqlen/max:69857 - global_seqlen/minmax_diff:16186 - global_seqlen/balanced_min:59175 - global_seqlen/balanced_max:59176 - global_seqlen/mean:59175.75 - actor/entropy:0.4903258681297302 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.26740601658821106 - training/rollout_probs_diff_mean:0.005810699425637722 - training/rollout_probs_diff_std:0.010563316754996777 - training/rollout_actor_probs_pearson_corr:0.9992461204528809 - rollout_corr/training_ppl:1.5854477882385254 - rollout_corr/training_log_ppl:0.44811883568763733 - rollout_corr/kl:0.0007533442694693804 - rollout_corr/k3_kl:0.0007572065806016326 - rollout_corr/rollout_ppl:1.5842294692993164 - rollout_corr/rollout_log_ppl:0.44736942648887634 - rollout_corr/log_ppl_diff:0.0007494155433960259 - rollout_corr/log_ppl_abs_diff:0.0013686673482879996 - rollout_corr/log_ppl_diff_max:0.005366384983062744 - rollout_corr/log_ppl_diff_min:-0.0038236677646636963 - rollout_corr/ppl_ratio:1.0007508993148804 - rollout_corr/chi2_token:0.0015301704406738281 - rollout_corr/chi2_seq:1.273806095123291 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.04703373063239269 - actor/kl_loss:0.004330739475335577 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.047038063406944275 - actor/grad_norm:0.3631048798561096 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.47672271728516 - perf/mfu/actor:0.0 - training/global_step:13 - training/epoch:0 - critic/score/mean:0.515625 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.515625 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04703372344374657 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.04703372344374657 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:658.3218994140625 - response_length/max:1024.0 - response_length/min:226.0 - response_length/clip_ratio:0.27812498807907104 - response_length_non_aborted/mean:658.3218994140625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:226.0 - response_length_non_aborted/clip_ratio:0.27812498807907104 - response/aborted_ratio:0.0 - prompt_length/mean:81.375 - prompt_length/max:143.0 - prompt_length/min:47.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.685799987986684e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.621940609999001 - timing_s/agent_loop/generate_sequences/max:16.080402628998854 - timing_s/agent_loop/generate_sequences/mean:10.306781847149898 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002703678001125809 - timing_s/agent_loop/compute_score/max:0.020809278001252096 - timing_s/agent_loop/compute_score/mean:0.004332197859469034 - timing_s/agent_loop/slowest/generate_sequences:16.07998801699796 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.004562148998957127 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:104 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.196079222001572 - timing_s/reward:1.641899871174246e-05 - timing_s/old_log_prob:2.842985702001897 - timing_s/ref:2.616320577002625 - timing_s/adv:0.014174926000123378 - timing_s/update_actor:9.065276350000204 - timing_s/update_weights:2.048789709999255 - timing_s/step:32.81122254199727 - timing_s/stop_profile:4.145999992033467e-05 - timing_per_token_ms/gen:0.07688146101594287 - timing_per_token_ms/ref:0.011053178781015133 - timing_per_token_ms/update_actor:0.038298105009231836 - timing_per_token_ms/adv:5.988485993047565e-05 - perf/total_num_tokens:236703 - perf/time_per_step:32.81122254199727 - perf/throughput:1803.5216433723863
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 08:59:44 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 08:59:44 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 08:59:44 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 08:59:44 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 15/1740 [09:54<20:37:28, 43.04s/it]
�[36m(TaskRunner pid=168816)�[0m step:14 - global_seqlen/min:56854 - global_seqlen/max:63528 - global_seqlen/minmax_diff:6674 - global_seqlen/balanced_min:58881 - global_seqlen/balanced_max:58883 - global_seqlen/mean:58882.25 - actor/entropy:0.5081133246421814 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.37225326895713806 - training/rollout_probs_diff_mean:0.005969363730400801 - training/rollout_probs_diff_std:0.010748980566859245 - training/rollout_actor_probs_pearson_corr:0.9992424249649048 - rollout_corr/training_ppl:1.6052656173706055 - rollout_corr/training_log_ppl:0.4602138102054596 - rollout_corr/kl:0.0007130916928872466 - rollout_corr/k3_kl:0.0007836457225494087 - rollout_corr/rollout_ppl:1.6040245294570923 - rollout_corr/rollout_log_ppl:0.45943984389305115 - rollout_corr/log_ppl_diff:0.0007740089786238968 - rollout_corr/log_ppl_abs_diff:0.001533539965748787 - rollout_corr/log_ppl_diff_max:0.008526891469955444 - rollout_corr/log_ppl_diff_min:-0.004660815000534058 - rollout_corr/ppl_ratio:1.0007758140563965 - rollout_corr/chi2_token:0.0017277002334594727 - rollout_corr/chi2_seq:2.443784713745117 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.039584462022493494 - actor/kl_loss:0.00529480428667739 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.03958975523710251 - actor/grad_norm:0.42083442211151123 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.46985244750977 - perf/mfu/actor:0.0 - training/global_step:14 - training/epoch:0 - critic/score/mean:0.5062500238418579 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.5062500238418579 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.03958446532487869 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.03958446532487869 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:653.3406372070312 - response_length/max:1024.0 - response_length/min:170.0 - response_length/clip_ratio:0.26249998807907104 - response_length_non_aborted/mean:653.3406372070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:170.0 - response_length_non_aborted/clip_ratio:0.26249998807907104 - response/aborted_ratio:0.0 - prompt_length/mean:82.6875 - prompt_length/max:194.0 - prompt_length/min:49.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.7776000428711995e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.787506086999201 - timing_s/agent_loop/generate_sequences/max:16.14261315699696 - timing_s/agent_loop/generate_sequences/mean:10.26481141186548 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027491090004332364 - timing_s/agent_loop/compute_score/max:0.016515236002305755 - timing_s/agent_loop/compute_score/mean:0.00441306419064631 - timing_s/agent_loop/slowest/generate_sequences:16.139473397000984 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.008092134001344675 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:95 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.264952661997086 - timing_s/reward:1.424899892299436e-05 - timing_s/old_log_prob:3.759496521000983 - timing_s/ref:2.636570260998269 - timing_s/adv:0.014096527000219794 - timing_s/update_actor:9.033088358999521 - timing_s/update_weights:1.9990395529966918 - timing_s/step:33.736196955000196 - timing_s/stop_profile:3.792600182350725e-05 - timing_per_token_ms/gen:0.07779705581409528 - timing_per_token_ms/ref:0.011194248950228079 - timing_per_token_ms/update_actor:0.03835234030204145 - timing_per_token_ms/adv:5.985049399530331e-05 - perf/total_num_tokens:235529 - perf/time_per_step:33.736196955000196 - perf/throughput:1745.3730803902245
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:00:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:00:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:00:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:00:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True, 'global_steps': 15}
�[36m(TaskRunner pid=168816)�[0m validation generation end
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 16/1740 [10:27<19:10:50, 40.05s/it]
�[36m(TaskRunner pid=168816)�[0m step:15 - global_seqlen/min:51809 - global_seqlen/max:64609 - global_seqlen/minmax_diff:12800 - global_seqlen/balanced_min:59980 - global_seqlen/balanced_max:59983 - global_seqlen/mean:59981.5 - actor/entropy:0.4763682186603546 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.7548202872276306 - training/rollout_probs_diff_mean:0.005843721330165863 - training/rollout_probs_diff_std:0.010806595906615257 - training/rollout_actor_probs_pearson_corr:0.9992017149925232 - rollout_corr/training_ppl:1.5723522901535034 - rollout_corr/training_log_ppl:0.44349169731140137 - rollout_corr/kl:0.0006575255538336933 - rollout_corr/k3_kl:0.0007667010650038719 - rollout_corr/rollout_ppl:1.5713672637939453 - rollout_corr/rollout_log_ppl:0.4428951144218445 - rollout_corr/log_ppl_diff:0.0005965259042568505 - rollout_corr/log_ppl_abs_diff:0.0013581730891019106 - rollout_corr/log_ppl_diff_max:0.005345404148101807 - rollout_corr/log_ppl_diff_min:-0.006841391324996948 - rollout_corr/ppl_ratio:1.0005980730056763 - rollout_corr/chi2_token:0.0017671585083007812 - rollout_corr/chi2_seq:3.3839707374572754 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.03804134082747623 - actor/kl_loss:0.005634129822283285 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.038046978414058685 - actor/grad_norm:0.41472145915031433 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.49716186523438 - perf/mfu/actor:0.0 - val-aux/openai/gsm8k/reward/mean@1:0.530705079605762 - val-core/openai/gsm8k/acc/mean@1:0.530705079605762 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0 - training/global_step:15 - training/epoch:0 - critic/score/mean:0.5687500238418579 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.5687500238418579 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.038041338324546814 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.038041338324546814 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:666.7531127929688 - response_length/max:1024.0 - response_length/min:194.0 - response_length/clip_ratio:0.27812498807907104 - response_length_non_aborted/mean:666.7531127929688 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:194.0 - response_length_non_aborted/clip_ratio:0.27812498807907104 - response/aborted_ratio:0.0 - prompt_length/mean:83.015625 - prompt_length/max:161.0 - prompt_length/min:51.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.701700163423084e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.173966211001243 - timing_s/agent_loop/generate_sequences/max:16.182333510001627 - timing_s/agent_loop/generate_sequences/mean:10.495174327012535 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002704634000110673 - timing_s/agent_loop/compute_score/max:0.010801134001667378 - timing_s/agent_loop/compute_score/mean:0.00392920712505429 - timing_s/agent_loop/slowest/generate_sequences:16.18022331999964 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.009178299002087442 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:74 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.304454014996736 - timing_s/reward:2.2234002244658768e-05 - timing_s/old_log_prob:2.849905563998618 - timing_s/ref:2.6245389779978723 - timing_s/adv:0.012748742003168445 - timing_s/update_actor:9.081459057000757 - timing_s/update_weights:1.9970871309997165 - timing_s/step:32.89905144699878 - timing_s/testing:26.976013557999977 - timing_s/stop_profile:9.711399980005808e-05 - timing_per_token_ms/gen:0.07641721783735891 - timing_per_token_ms/ref:0.010938951918499338 - timing_per_token_ms/update_actor:0.03785108348824536 - timing_per_token_ms/adv:5.3136141990315536e-05 - perf/total_num_tokens:239926 - perf/time_per_step:32.89905144699878 - perf/throughput:1823.1984620174153
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:01:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:01:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:01:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:01:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 17/1740 [11:00<18:07:51, 37.88s/it]
�[36m(TaskRunner pid=168816)�[0m step:16 - global_seqlen/min:58115 - global_seqlen/max:63498 - global_seqlen/minmax_diff:5383 - global_seqlen/balanced_min:61382 - global_seqlen/balanced_max:61384 - global_seqlen/mean:61383.0 - actor/entropy:0.4860338866710663 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.23902937769889832 - training/rollout_probs_diff_mean:0.005796035751700401 - training/rollout_probs_diff_std:0.010559595189988613 - training/rollout_actor_probs_pearson_corr:0.9992290735244751 - rollout_corr/training_ppl:1.567757248878479 - rollout_corr/training_log_ppl:0.4377087950706482 - rollout_corr/kl:0.0009305548737756908 - rollout_corr/k3_kl:0.000751456362195313 - rollout_corr/rollout_ppl:1.566310167312622 - rollout_corr/rollout_log_ppl:0.4368000030517578 - rollout_corr/log_ppl_diff:0.0009087914368137717 - rollout_corr/log_ppl_abs_diff:0.0014952889177948236 - rollout_corr/log_ppl_diff_max:0.006105780601501465 - rollout_corr/log_ppl_diff_min:-0.0062399208545684814 - rollout_corr/ppl_ratio:1.000910758972168 - rollout_corr/chi2_token:0.0011591911315917969 - rollout_corr/chi2_seq:0.6195968389511108 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.035850027119522565 - actor/kl_loss:0.006025656843121396 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.03585605323314667 - actor/grad_norm:0.390458345413208 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.68446731567383 - perf/mfu/actor:0.0 - training/global_step:16 - training/epoch:0 - critic/score/mean:0.48124998807907104 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.48124998807907104 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.03585003688931465 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.03585003688931465 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:678.4124755859375 - response_length/max:1024.0 - response_length/min:222.0 - response_length/clip_ratio:0.30937498807907104 - response_length_non_aborted/mean:678.4124755859375 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:222.0 - response_length_non_aborted/clip_ratio:0.30937498807907104 - response/aborted_ratio:0.0 - prompt_length/mean:88.875 - prompt_length/max:163.0 - prompt_length/min:55.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.9579000915400684e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.6332162949984195 - timing_s/agent_loop/generate_sequences/max:16.264381584998773 - timing_s/agent_loop/generate_sequences/mean:10.746313273543796 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027032670004700776 - timing_s/agent_loop/compute_score/max:0.020010490999993635 - timing_s/agent_loop/compute_score/mean:0.004321964290795677 - timing_s/agent_loop/slowest/generate_sequences:16.264381584998773 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0030897180004103575 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:71 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.38534863300083 - timing_s/reward:1.61239986482542e-05 - timing_s/old_log_prob:2.886696708999807 - timing_s/ref:2.625262722001935 - timing_s/adv:0.013130318999174051 - timing_s/update_actor:9.149327651997737 - timing_s/update_weights:2.013115411999024 - timing_s/step:33.10380882499885 - timing_s/stop_profile:3.671400190796703e-05 - timing_per_token_ms/gen:0.07547651978424277 - timing_per_token_ms/ref:0.010692140828901874 - timing_per_token_ms/update_actor:0.03726327994720744 - timing_per_token_ms/adv:5.347701724896979e-05 - perf/total_num_tokens:245532 - perf/time_per_step:33.10380882499885 - perf/throughput:1854.2579291856496
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:01:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:01:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:01:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:01:50 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 18/1740 [11:34<17:31:26, 36.64s/it]
�[36m(TaskRunner pid=168816)�[0m step:17 - global_seqlen/min:53862 - global_seqlen/max:63014 - global_seqlen/minmax_diff:9152 - global_seqlen/balanced_min:60185 - global_seqlen/balanced_max:60188 - global_seqlen/mean:60187.0 - actor/entropy:0.48076343536376953 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.28627169132232666 - training/rollout_probs_diff_mean:0.005828942637890577 - training/rollout_probs_diff_std:0.010596639476716518 - training/rollout_actor_probs_pearson_corr:0.9992518424987793 - rollout_corr/training_ppl:1.5746735334396362 - rollout_corr/training_log_ppl:0.4414962828159332 - rollout_corr/kl:0.0008364070090465248 - rollout_corr/k3_kl:0.0007703916053287685 - rollout_corr/rollout_ppl:1.5733253955841064 - rollout_corr/rollout_log_ppl:0.4406612515449524 - rollout_corr/log_ppl_diff:0.0008349834242835641 - rollout_corr/log_ppl_abs_diff:0.0014619033318012953 - rollout_corr/log_ppl_diff_max:0.006263375282287598 - rollout_corr/log_ppl_diff_min:-0.004083096981048584 - rollout_corr/ppl_ratio:1.0008366107940674 - rollout_corr/chi2_token:0.0014262199401855469 - rollout_corr/chi2_seq:2.2997329235076904 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.047645398763961566 - actor/kl_loss:0.007227621641504811 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.04765262454748154 - actor/grad_norm:0.4192851185798645 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.6451416015625 - perf/mfu/actor:0.0 - training/global_step:17 - training/epoch:0 - critic/score/mean:0.5218750238418579 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.5218750238418579 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04764540493488312 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.04764540493488312 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:664.2750244140625 - response_length/max:1024.0 - response_length/min:214.0 - response_length/clip_ratio:0.23749999701976776 - response_length_non_aborted/mean:664.2750244140625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:214.0 - response_length_non_aborted/clip_ratio:0.23749999701976776 - response/aborted_ratio:0.0 - prompt_length/mean:88.0625 - prompt_length/max:235.0 - prompt_length/min:61.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.0097998609999195e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.4672949129999324 - timing_s/agent_loop/generate_sequences/max:16.130911319996812 - timing_s/agent_loop/generate_sequences/mean:10.469219622000105 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0026290409987268504 - timing_s/agent_loop/compute_score/max:0.013128522001352394 - timing_s/agent_loop/compute_score/mean:0.0038356733467594497 - timing_s/agent_loop/slowest/generate_sequences:16.130911319996812 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.00549876999866683 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:83 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.24298056599946 - timing_s/reward:1.4377001207321882e-05 - timing_s/old_log_prob:2.8310391920022084 - timing_s/ref:2.6195236120001937 - timing_s/adv:0.012635728999157436 - timing_s/update_actor:9.152822083000501 - timing_s/update_weights:1.9362938209997083 - timing_s/step:32.82389551199958 - timing_s/stop_profile:4.7653000365244225e-05 - timing_per_token_ms/gen:0.07641310341161163 - timing_per_token_ms/ref:0.01088076998355207 - timing_per_token_ms/update_actor:0.03801826840929312 - timing_per_token_ms/adv:5.2485291670782046e-05 - perf/total_num_tokens:240748 - perf/time_per_step:32.82389551199958 - perf/throughput:1833.6336702630913
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:02:24 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:02:24 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:02:24 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:02:24 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 19/1740 [12:06<16:53:38, 35.34s/it]
�[36m(TaskRunner pid=168816)�[0m step:18 - global_seqlen/min:56263 - global_seqlen/max:60944 - global_seqlen/minmax_diff:4681 - global_seqlen/balanced_min:58228 - global_seqlen/balanced_max:58229 - global_seqlen/mean:58228.25 - actor/entropy:0.5089429616928101 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.5759397149085999 - training/rollout_probs_diff_mean:0.00602225074544549 - training/rollout_probs_diff_std:0.01085000578314066 - training/rollout_actor_probs_pearson_corr:0.9992107152938843 - rollout_corr/training_ppl:1.5972206592559814 - rollout_corr/training_log_ppl:0.45514926314353943 - rollout_corr/kl:0.0008028328302316368 - rollout_corr/k3_kl:0.0007894729496911168 - rollout_corr/rollout_ppl:1.5959594249725342 - rollout_corr/rollout_log_ppl:0.45435720682144165 - rollout_corr/log_ppl_diff:0.0007920815842226148 - rollout_corr/log_ppl_abs_diff:0.0014881505630910397 - rollout_corr/log_ppl_diff_max:0.006014317274093628 - rollout_corr/log_ppl_diff_min:-0.0034145712852478027 - rollout_corr/ppl_ratio:1.0007938146591187 - rollout_corr/chi2_token:0.0015696287155151367 - rollout_corr/chi2_seq:1.3326420783996582 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.0558336070042717 - actor/kl_loss:0.006934289176570019 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.05584053695201874 - actor/grad_norm:0.4100557565689087 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.6640167236328 - perf/mfu/actor:0.0 - training/global_step:18 - training/epoch:0 - critic/score/mean:0.534375011920929 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.534375011920929 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.055833615362644196 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.055833615362644196 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:647.0250244140625 - response_length/max:1024.0 - response_length/min:215.0 - response_length/clip_ratio:0.2874999940395355 - response_length_non_aborted/mean:647.0250244140625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:215.0 - response_length_non_aborted/clip_ratio:0.2874999940395355 - response/aborted_ratio:0.0 - prompt_length/mean:80.828125 - prompt_length/max:157.0 - prompt_length/min:45.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.091400089440867e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.480695623999054 - timing_s/agent_loop/generate_sequences/max:16.123993196997617 - timing_s/agent_loop/generate_sequences/mean:10.121382060896826 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0027168089982296806 - timing_s/agent_loop/compute_score/max:0.016185786000278313 - timing_s/agent_loop/compute_score/mean:0.004166965468675699 - timing_s/agent_loop/slowest/generate_sequences:16.11493839600007 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.016185786000278313 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:111 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.234735892998287 - timing_s/reward:1.8273000023327768e-05 - timing_s/old_log_prob:2.8034331089984335 - timing_s/ref:3.5758025780014577 - timing_s/adv:0.015771630001836456 - timing_s/update_actor:9.062873621001927 - timing_s/update_weights:2.006462494999141 - timing_s/step:33.726558854999894 - timing_s/stop_profile:3.703099719132297e-05 - timing_per_token_ms/gen:0.07841049366812665 - timing_per_token_ms/ref:0.015352524668015344 - timing_per_token_ms/update_actor:0.038910982302413036 - timing_per_token_ms/adv:6.771468317284332e-05 - perf/total_num_tokens:232913 - perf/time_per_step:33.726558854999894 - perf/throughput:1726.480612811401
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:02:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:02:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:02:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:02:56 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(WorkerDict pid=170437)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/gcu/memory.py:787: UserWarning: [GCU Allocator] Allocator config updated, backend unchanged: native (Triggered internally at /home/gongxijun/icode/src_tops/dev/dev2_1/torch_gcu/torch_gcu/csrc/gcu/gcu_allocator_config.cpp:306.)
�[36m(WorkerDict pid=170437)�[0m   return _C._gcu_gcuCachingAllocator_set_allocator_settings(env)
�[36m(WorkerDict pid=170434)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/gcu/memory.py:787: UserWarning: [GCU Allocator] Allocator config updated, backend unchanged: native (Triggered internally at /home/gongxijun/icode/src_tops/dev/dev2_1/torch_gcu/torch_gcu/csrc/gcu/gcu_allocator_config.cpp:306.)
�[36m(WorkerDict pid=170434)�[0m   return _C._gcu_gcuCachingAllocator_set_allocator_settings(env)
�[36m(WorkerDict pid=170436)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/gcu/memory.py:787: UserWarning: [GCU Allocator] Allocator config updated, backend unchanged: native (Triggered internally at /home/gongxijun/icode/src_tops/dev/dev2_1/torch_gcu/torch_gcu/csrc/gcu/gcu_allocator_config.cpp:306.)
�[36m(WorkerDict pid=170436)�[0m   return _C._gcu_gcuCachingAllocator_set_allocator_settings(env)
�[36m(WorkerDict pid=170435)�[0m /usr/local/lib/python3.12/dist-packages/torch_gcu/gcu/memory.py:787: UserWarning: [GCU Allocator] Allocator config updated, backend unchanged: native (Triggered internally at /home/gongxijun/icode/src_tops/dev/dev2_1/torch_gcu/torch_gcu/csrc/gcu/gcu_allocator_config.cpp:306.)
�[36m(WorkerDict pid=170435)�[0m   return _C._gcu_gcuCachingAllocator_set_allocator_settings(env)
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 20/1740 [13:08<20:40:07, 43.26s/it]
�[36m(TaskRunner pid=168816)�[0m step:19 - global_seqlen/min:47256 - global_seqlen/max:56667 - global_seqlen/minmax_diff:9411 - global_seqlen/balanced_min:51061 - global_seqlen/balanced_max:51065 - global_seqlen/mean:51063.0 - actor/entropy:0.4646935760974884 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.1850496530532837 - training/rollout_probs_diff_mean:0.005832221359014511 - training/rollout_probs_diff_std:0.010722221806645393 - training/rollout_actor_probs_pearson_corr:0.9991915822029114 - rollout_corr/training_ppl:1.5320274829864502 - rollout_corr/training_log_ppl:0.4154438376426697 - rollout_corr/kl:0.0007595553179271519 - rollout_corr/k3_kl:0.0007666436722502112 - rollout_corr/rollout_ppl:1.530781626701355 - rollout_corr/rollout_log_ppl:0.4146111011505127 - rollout_corr/log_ppl_diff:0.0008327728137373924 - rollout_corr/log_ppl_abs_diff:0.001674116007052362 - rollout_corr/log_ppl_diff_max:0.007585674524307251 - rollout_corr/log_ppl_diff_min:-0.0038227438926696777 - rollout_corr/ppl_ratio:1.0008349418640137 - rollout_corr/chi2_token:0.0015511512756347656 - rollout_corr/chi2_seq:1.08790922164917 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.0451529195306648 - actor/kl_loss:0.008944992543547414 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.04516186565160751 - actor/grad_norm:0.45307138562202454 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.65197372436523 - perf/mfu/actor:0.0 - training/global_step:19 - training/epoch:0 - critic/score/mean:0.653124988079071 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.653124988079071 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04515291750431061 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.04515291750431061 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:561.4906005859375 - response_length/max:1024.0 - response_length/min:141.0 - response_length/clip_ratio:0.17812499403953552 - response_length_non_aborted/mean:561.4906005859375 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:141.0 - response_length_non_aborted/clip_ratio:0.17812499403953552 - response/aborted_ratio:0.0 - prompt_length/mean:76.796875 - prompt_length/max:128.0 - prompt_length/min:53.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.116099782753736e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.34841650999806 - timing_s/agent_loop/generate_sequences/max:16.166845117000776 - timing_s/agent_loop/generate_sequences/mean:8.844176384374952 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002633979998790892 - timing_s/agent_loop/compute_score/max:0.00894211400009226 - timing_s/agent_loop/compute_score/mean:0.0036251589249673088 - timing_s/agent_loop/slowest/generate_sequences:16.166845117000776 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0034498180029913783 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:79 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.28022271199734 - timing_s/reward:2.1011997887399048e-05 - timing_s/old_log_prob:2.776754652000818 - timing_s/ref:2.5290313139994396 - timing_s/adv:0.014320917998702498 - timing_s/update_actor:8.695544209997024 - timing_s/update_weights:1.9869934089983872 - timing_s/step:32.311731621000945 - timing_s/stop_profile:6.759700045222417e-05 - timing_per_token_ms/gen:0.0906082732458653 - timing_per_token_ms/ref:0.012381917014273738 - timing_per_token_ms/update_actor:0.04257262699996585 - timing_per_token_ms/adv:7.011396705394561e-05 - perf/total_num_tokens:204252 - perf/time_per_step:32.311731621000945 - perf/throughput:1580.3238464264696
�[36m(TaskRunner pid=168816)�[0m local_global_step_folder: checkpoints/verl_grpo_enflame_fl/qwen3_0.6b_enflame_fl/global_step_20
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:03:31 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:03:31 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:03:31 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:03:31 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True, 'global_steps': 20}
�[36m(TaskRunner pid=168816)�[0m validation generation end
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|          | 21/1740 [13:41<19:09:40, 40.13s/it]
�[36m(TaskRunner pid=168816)�[0m step:20 - global_seqlen/min:49644 - global_seqlen/max:63631 - global_seqlen/minmax_diff:13987 - global_seqlen/balanced_min:55866 - global_seqlen/balanced_max:55869 - global_seqlen/mean:55867.75 - actor/entropy:0.4750954508781433 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.32928991317749023 - training/rollout_probs_diff_mean:0.005818525329232216 - training/rollout_probs_diff_std:0.010717230848968029 - training/rollout_actor_probs_pearson_corr:0.9992057681083679 - rollout_corr/training_ppl:1.5482028722763062 - rollout_corr/training_log_ppl:0.426230251789093 - rollout_corr/kl:0.0006806942401453853 - rollout_corr/k3_kl:0.0007661313284188509 - rollout_corr/rollout_ppl:1.5471845865249634 - rollout_corr/rollout_log_ppl:0.4255986213684082 - rollout_corr/log_ppl_diff:0.0006315883947536349 - rollout_corr/log_ppl_abs_diff:0.0013949668500572443 - rollout_corr/log_ppl_diff_max:0.005883634090423584 - rollout_corr/log_ppl_diff_min:-0.004355669021606445 - rollout_corr/ppl_ratio:1.0006331205368042 - rollout_corr/chi2_token:0.001717686653137207 - rollout_corr/chi2_seq:1.0985441207885742 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.0586944329797916 - actor/kl_loss:0.009786091046407819 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.05870421603322029 - actor/grad_norm:0.42359501123428345 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.67327499389648 - perf/mfu/actor:0.0 - val-aux/openai/gsm8k/reward/mean@1:0.5974222896133434 - val-core/openai/gsm8k/acc/mean@1:0.5974222896133434 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0 - training/global_step:20 - training/epoch:0 - critic/score/mean:0.6000000238418579 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.6000000238418579 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.05869443714618683 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.05869443714618683 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:618.9874877929688 - response_length/max:1024.0 - response_length/min:178.0 - response_length/clip_ratio:0.25312501192092896 - response_length_non_aborted/mean:618.9874877929688 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:178.0 - response_length_non_aborted/clip_ratio:0.25312501192092896 - response/aborted_ratio:0.0 - prompt_length/mean:79.359375 - prompt_length/max:140.0 - prompt_length/min:45.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.032400076743215e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.910614518001239 - timing_s/agent_loop/generate_sequences/max:16.212441876999947 - timing_s/agent_loop/generate_sequences/mean:9.679704941843625 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002558132000558544 - timing_s/agent_loop/compute_score/max:0.017352054997900268 - timing_s/agent_loop/compute_score/mean:0.004064114003142549 - timing_s/agent_loop/slowest/generate_sequences:16.212441876999947 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.005048384999099653 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:86 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.3279888879988 - timing_s/reward:1.5940000594127923e-05 - timing_s/old_log_prob:2.863342189000832 - timing_s/ref:2.578780119998555 - timing_s/adv:0.012626473002455896 - timing_s/update_actor:8.880359039001632 - timing_s/save_checkpoint:2.507479500000045 - timing_s/update_weights:1.9864698059973307 - timing_s/step:35.185345284000505 - timing_s/testing:26.531068257998413 - timing_s/stop_profile:8.032000187085941e-05 - timing_per_token_ms/gen:0.08243294941335043 - timing_per_token_ms/ref:0.01153966340150872 - timing_per_token_ms/update_actor:0.03973830626345983 - timing_per_token_ms/adv:5.6501617670551865e-05 - perf/total_num_tokens:223471 - perf/time_per_step:35.185345284000505 - perf/throughput:1587.8130383277553
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:04:30 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:04:31 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:04:31 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:04:31 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|▏         | 22/1740 [14:13<18:03:01, 37.82s/it]
�[36m(TaskRunner pid=168816)�[0m step:21 - global_seqlen/min:50708 - global_seqlen/max:61752 - global_seqlen/minmax_diff:11044 - global_seqlen/balanced_min:55557 - global_seqlen/balanced_max:55558 - global_seqlen/mean:55557.5 - actor/entropy:0.4855477511882782 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.5754918456077576 - training/rollout_probs_diff_mean:0.005848593544214964 - training/rollout_probs_diff_std:0.010658876039087772 - training/rollout_actor_probs_pearson_corr:0.9992403984069824 - rollout_corr/training_ppl:1.5662895441055298 - rollout_corr/training_log_ppl:0.43644094467163086 - rollout_corr/kl:0.000759393849875778 - rollout_corr/k3_kl:0.0007735519902780652 - rollout_corr/rollout_ppl:1.5650724172592163 - rollout_corr/rollout_log_ppl:0.43566590547561646 - rollout_corr/log_ppl_diff:0.0007750717923045158 - rollout_corr/log_ppl_abs_diff:0.0015368061140179634 - rollout_corr/log_ppl_diff_max:0.006921708583831787 - rollout_corr/log_ppl_diff_min:-0.004295378923416138 - rollout_corr/ppl_ratio:1.000777006149292 - rollout_corr/chi2_token:0.0015873908996582031 - rollout_corr/chi2_seq:1.9261066913604736 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.07229252233446459 - actor/kl_loss:0.010349440570280422 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.07230287790298462 - actor/grad_norm:0.44691774249076843 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.8494758605957 - perf/mfu/actor:0.0 - training/global_step:21 - training/epoch:0 - critic/score/mean:0.637499988079071 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.637499988079071 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.0722925141453743 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.0722925141453743 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:612.828125 - response_length/max:1024.0 - response_length/min:210.0 - response_length/clip_ratio:0.23749999701976776 - response_length_non_aborted/mean:612.828125 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:210.0 - response_length_non_aborted/clip_ratio:0.23749999701976776 - response/aborted_ratio:0.0 - prompt_length/mean:81.640625 - prompt_length/max:133.0 - prompt_length/min:52.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.655300290323794e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.409360134999588 - timing_s/agent_loop/generate_sequences/max:16.418982903000142 - timing_s/agent_loop/generate_sequences/mean:9.75670205760299 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0026375650013505947 - timing_s/agent_loop/compute_score/max:0.01515549000032479 - timing_s/agent_loop/compute_score/mean:0.0038316867593152894 - timing_s/agent_loop/slowest/generate_sequences:16.418982903000142 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003231319999031257 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:63 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.53844905799997 - timing_s/reward:1.598399830982089e-05 - timing_s/old_log_prob:2.807800321999821 - timing_s/ref:2.588454673998058 - timing_s/adv:0.013649114000145346 - timing_s/update_actor:8.843577037001523 - timing_s/update_weights:1.9974587420001626 - timing_s/step:32.81944909400045 - timing_s/stop_profile:6.451300214393996e-05 - timing_per_token_ms/gen:0.0843346628489838 - timing_per_token_ms/ref:0.011647638365648463 - timing_per_token_ms/update_actor:0.03979470385187204 - timing_per_token_ms/adv:6.141886334043715e-05 - perf/total_num_tokens:222230 - perf/time_per_step:32.81944909400045 - perf/throughput:1692.8224432065856
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:05:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:05:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:05:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:05:03 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|▏         | 23/1740 [14:47<17:25:01, 36.52s/it]
�[36m(TaskRunner pid=168816)�[0m step:22 - global_seqlen/min:47910 - global_seqlen/max:59406 - global_seqlen/minmax_diff:11496 - global_seqlen/balanced_min:54123 - global_seqlen/balanced_max:54126 - global_seqlen/mean:54124.25 - actor/entropy:0.4494361877441406 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.2981443405151367 - training/rollout_probs_diff_mean:0.005694267805665731 - training/rollout_probs_diff_std:0.010742568410933018 - training/rollout_actor_probs_pearson_corr:0.9991883039474487 - rollout_corr/training_ppl:1.5246068239212036 - rollout_corr/training_log_ppl:0.4131465554237366 - rollout_corr/kl:0.000728218350559473 - rollout_corr/k3_kl:0.0007397804874926805 - rollout_corr/rollout_ppl:1.5235308408737183 - rollout_corr/rollout_log_ppl:0.4124584197998047 - rollout_corr/log_ppl_diff:0.0006880916771478951 - rollout_corr/log_ppl_abs_diff:0.0014181274455040693 - rollout_corr/log_ppl_diff_max:0.005715399980545044 - rollout_corr/log_ppl_diff_min:-0.004373431205749512 - rollout_corr/ppl_ratio:1.0006897449493408 - rollout_corr/chi2_token:0.0015126466751098633 - rollout_corr/chi2_seq:2.0194129943847656 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.06052830167641332 - actor/kl_loss:0.01258629185758764 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.06054088473320007 - actor/grad_norm:0.4259088933467865 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.82838821411133 - perf/mfu/actor:0.0 - training/global_step:22 - training/epoch:0 - critic/score/mean:0.6468750238418579 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.6468750238418579 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.06052829697728157 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.06052829697728157 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:598.3031005859375 - response_length/max:1024.0 - response_length/min:160.0 - response_length/clip_ratio:0.17812499403953552 - response_length_non_aborted/mean:598.3031005859375 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:160.0 - response_length_non_aborted/clip_ratio:0.17812499403953552 - response/aborted_ratio:0.0 - prompt_length/mean:78.25 - prompt_length/max:135.0 - prompt_length/min:53.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.453999983728863e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.5735764389974065 - timing_s/agent_loop/generate_sequences/max:16.23107671699836 - timing_s/agent_loop/generate_sequences/mean:9.455790213971886 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0026660440016712528 - timing_s/agent_loop/compute_score/max:0.00849639299849514 - timing_s/agent_loop/compute_score/mean:0.0036051837125228303 - timing_s/agent_loop/slowest/generate_sequences:16.23107671699836 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.003925879002053989 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:83 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.344550529000117 - timing_s/reward:3.987099989899434e-05 - timing_s/old_log_prob:2.749906231998466 - timing_s/ref:2.554518638000445 - timing_s/adv:0.012555221001093742 - timing_s/update_actor:8.76083010399816 - timing_s/update_weights:1.9938638739986345 - timing_s/step:32.44442007500038 - timing_s/stop_profile:3.7934001738904044e-05 - timing_per_token_ms/gen:0.08536930239688346 - timing_per_token_ms/ref:0.011799325801283367 - timing_per_token_ms/update_actor:0.04046628869683257 - timing_per_token_ms/adv:5.7992586507405376e-05 - perf/total_num_tokens:216497 - perf/time_per_step:32.44442007500038 - perf/throughput:1668.2144379490612
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:05:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:05:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:05:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:05:36 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|▏         | 24/1740 [15:19<16:51:50, 35.38s/it]
�[36m(TaskRunner pid=168816)�[0m step:23 - global_seqlen/min:51538 - global_seqlen/max:60106 - global_seqlen/minmax_diff:8568 - global_seqlen/balanced_min:54801 - global_seqlen/balanced_max:54803 - global_seqlen/mean:54802.25 - actor/entropy:0.48521143198013306 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.28905150294303894 - training/rollout_probs_diff_mean:0.005918782204389572 - training/rollout_probs_diff_std:0.010798006318509579 - training/rollout_actor_probs_pearson_corr:0.999212920665741 - rollout_corr/training_ppl:1.5693069696426392 - rollout_corr/training_log_ppl:0.43889516592025757 - rollout_corr/kl:0.0007670154445804656 - rollout_corr/k3_kl:0.0007780859596095979 - rollout_corr/rollout_ppl:1.568167805671692 - rollout_corr/rollout_log_ppl:0.43819791078567505 - rollout_corr/log_ppl_diff:0.00069724878994748 - rollout_corr/log_ppl_abs_diff:0.0014364216476678848 - rollout_corr/log_ppl_diff_max:0.006469756364822388 - rollout_corr/log_ppl_diff_min:-0.0066370368003845215 - rollout_corr/ppl_ratio:1.0006989240646362 - rollout_corr/chi2_token:0.0016001462936401367 - rollout_corr/chi2_seq:0.7966220378875732 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.05101763391212444 - actor/kl_loss:0.013356839917832986 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.05103098973631859 - actor/grad_norm:0.4753238260746002 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.8376922607422 - perf/mfu/actor:0.0 - training/global_step:23 - training/epoch:0 - critic/score/mean:0.6625000238418579 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.6625000238418579 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.0510176382958889 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.0510176382958889 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:600.0281372070312 - response_length/max:1024.0 - response_length/min:206.0 - response_length/clip_ratio:0.18125000596046448 - response_length_non_aborted/mean:600.0281372070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:206.0 - response_length_non_aborted/clip_ratio:0.18125000596046448 - response/aborted_ratio:0.0 - prompt_length/mean:85.0 - prompt_length/max:148.0 - prompt_length/min:54.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.05559983139392e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.35574740699667 - timing_s/agent_loop/generate_sequences/max:16.1690516689996 - timing_s/agent_loop/generate_sequences/mean:9.410772394559535 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002676913998584496 - timing_s/agent_loop/compute_score/max:0.012606837997736875 - timing_s/agent_loop/compute_score/mean:0.003702350418814149 - timing_s/agent_loop/slowest/generate_sequences:16.1690516689996 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.004849629000091227 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:65 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.325619559000188 - timing_s/reward:1.535899718874134e-05 - timing_s/old_log_prob:3.6500039960010326 - timing_s/ref:2.5934809719983605 - timing_s/adv:0.013301540002430556 - timing_s/update_actor:8.856736333000299 - timing_s/update_weights:1.99907849200099 - timing_s/step:33.46620050700221 - timing_s/stop_profile:3.7756999518023804e-05 - timing_per_token_ms/gen:0.0850252829763198 - timing_per_token_ms/ref:0.011831088011889843 - timing_per_token_ms/update_actor:0.040403160148535414 - timing_per_token_ms/adv:6.0679716628562494e-05 - perf/total_num_tokens:219209 - perf/time_per_step:33.46620050700221 - perf/throughput:1637.54023969747
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:06:09 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:06:09 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:06:09 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:06:09 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|▏         | 25/1740 [16:17<19:59:42, 41.97s/it]
�[36m(TaskRunner pid=168816)�[0m step:24 - global_seqlen/min:54684 - global_seqlen/max:62868 - global_seqlen/minmax_diff:8184 - global_seqlen/balanced_min:58528 - global_seqlen/balanced_max:58529 - global_seqlen/mean:58528.5 - actor/entropy:0.5033730864524841 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.24076908826828003 - training/rollout_probs_diff_mean:0.005905128084123135 - training/rollout_probs_diff_std:0.010564177297055721 - training/rollout_actor_probs_pearson_corr:0.9992672801017761 - rollout_corr/training_ppl:1.6008656024932861 - rollout_corr/training_log_ppl:0.45879679918289185 - rollout_corr/kl:0.0006975781288929284 - rollout_corr/k3_kl:0.0007753208628855646 - rollout_corr/rollout_ppl:1.5996936559677124 - rollout_corr/rollout_log_ppl:0.45808377861976624 - rollout_corr/log_ppl_diff:0.0007130198064260185 - rollout_corr/log_ppl_abs_diff:0.0014035161584615707 - rollout_corr/log_ppl_diff_max:0.005885779857635498 - rollout_corr/log_ppl_diff_min:-0.004737287759780884 - rollout_corr/ppl_ratio:1.0007145404815674 - rollout_corr/chi2_token:0.0017403364181518555 - rollout_corr/chi2_seq:1.7014238834381104 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.04293003925704397 - actor/kl_loss:0.01311905790498713 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.04294315725564957 - actor/grad_norm:0.4543357193470001 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.86105346679688 - perf/mfu/actor:0.0 - training/global_step:24 - training/epoch:0 - critic/score/mean:0.5874999761581421 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.5874999761581421 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04293004423379898 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.04293004423379898 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:644.3875122070312 - response_length/max:1024.0 - response_length/min:200.0 - response_length/clip_ratio:0.26249998807907104 - response_length_non_aborted/mean:644.3875122070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:200.0 - response_length_non_aborted/clip_ratio:0.26249998807907104 - response/aborted_ratio:0.0 - prompt_length/mean:87.21875 - prompt_length/max:156.0 - prompt_length/min:51.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.7370998800033703e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.145187950998661 - timing_s/agent_loop/generate_sequences/max:16.10578345200338 - timing_s/agent_loop/generate_sequences/mean:10.052234019490744 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002603655000712024 - timing_s/agent_loop/compute_score/max:0.01137411899981089 - timing_s/agent_loop/compute_score/mean:0.004009648628186824 - timing_s/agent_loop/slowest/generate_sequences:16.101163261999318 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.009158697997918352 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:108 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.234287290000793 - timing_s/reward:1.9219998648623005e-05 - timing_s/old_log_prob:2.8293758260006143 - timing_s/ref:2.6053759759997774 - timing_s/adv:0.013502670000889339 - timing_s/update_actor:8.961267197002599 - timing_s/update_weights:2.040831791000528 - timing_s/step:32.71693260900065 - timing_s/stop_profile:3.809500049101189e-05 - timing_per_token_ms/gen:0.07872925496111033 - timing_per_token_ms/ref:0.011128663710840776 - timing_per_token_ms/update_actor:0.038277365715004646 - timing_per_token_ms/adv:5.767561957375184e-05 - perf/total_num_tokens:234114 - perf/time_per_step:32.71693260900065 - perf/throughput:1788.9360442029463
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:06:42 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:06:42 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:06:42 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:06:42 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True, 'global_steps': 25}
�[36m(TaskRunner pid=168816)�[0m validation generation end
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   1%|▏         | 26/1740 [16:49<18:37:28, 39.12s/it]
�[36m(TaskRunner pid=168816)�[0m step:25 - global_seqlen/min:49238 - global_seqlen/max:60350 - global_seqlen/minmax_diff:11112 - global_seqlen/balanced_min:55189 - global_seqlen/balanced_max:55192 - global_seqlen/mean:55191.0 - actor/entropy:0.49370622634887695 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.23105472326278687 - training/rollout_probs_diff_mean:0.006012560799717903 - training/rollout_probs_diff_std:0.010894824750721455 - training/rollout_actor_probs_pearson_corr:0.999191403388977 - rollout_corr/training_ppl:1.6012042760849 - rollout_corr/training_log_ppl:0.45784932374954224 - rollout_corr/kl:0.0007576821953989565 - rollout_corr/k3_kl:0.0008103047148324549 - rollout_corr/rollout_ppl:1.6000044345855713 - rollout_corr/rollout_log_ppl:0.45710745453834534 - rollout_corr/log_ppl_diff:0.0007418597815558314 - rollout_corr/log_ppl_abs_diff:0.0015720067312940955 - rollout_corr/log_ppl_diff_max:0.007128119468688965 - rollout_corr/log_ppl_diff_min:-0.005157411098480225 - rollout_corr/ppl_ratio:1.0007438659667969 - rollout_corr/chi2_token:0.0017327070236206055 - rollout_corr/chi2_seq:3.504389762878418 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.04950157201164984 - actor/kl_loss:0.01578946602239739 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.04951735958456993 - actor/grad_norm:0.42938363552093506 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.85966110229492 - perf/mfu/actor:0.0 - val-aux/openai/gsm8k/reward/mean@1:0.6633813495072024 - val-core/openai/gsm8k/acc/mean@1:0.6633813495072024 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0 - training/global_step:25 - training/epoch:0 - critic/score/mean:0.6312500238418579 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.6312500238418579 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04950156435370445 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.04950156435370445 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:602.9968872070312 - response_length/max:1024.0 - response_length/min:194.0 - response_length/clip_ratio:0.21875 - response_length_non_aborted/mean:602.9968872070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:194.0 - response_length_non_aborted/clip_ratio:0.21875 - response/aborted_ratio:0.0 - prompt_length/mean:86.890625 - prompt_length/max:190.0 - prompt_length/min:50.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.204899935051799e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.0624597700007143 - timing_s/agent_loop/generate_sequences/max:16.045067139999446 - timing_s/agent_loop/generate_sequences/mean:9.462711604971764 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.00269982800091384 - timing_s/agent_loop/compute_score/max:0.013206832998548634 - timing_s/agent_loop/compute_score/mean:0.0038266782249934293 - timing_s/agent_loop/slowest/generate_sequences:16.039894744000776 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.013206832998548634 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:116 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.158479688001535 - timing_s/reward:4.129200169700198e-05 - timing_s/old_log_prob:2.7795099559989467 - timing_s/ref:2.58289748199968 - timing_s/adv:0.012582236002344871 - timing_s/update_actor:8.907299908998539 - timing_s/update_weights:1.996198467000795 - timing_s/step:32.46625685899926 - timing_s/testing:24.872202100999857 - timing_s/stop_profile:9.127999874181114e-05 - timing_per_token_ms/gen:0.0837404821127884 - timing_per_token_ms/ref:0.011699812840860286 - timing_per_token_ms/update_actor:0.04034761061132494 - timing_per_token_ms/adv:5.699405701266905e-05 - perf/total_num_tokens:220764 - perf/time_per_step:32.46625685899926 - perf/throughput:1699.9495888822094
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:07:39 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:07:39 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:07:39 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:07:39 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 27/1740 [17:23<17:48:59, 37.44s/it]
�[36m(TaskRunner pid=168816)�[0m step:26 - global_seqlen/min:46747 - global_seqlen/max:53863 - global_seqlen/minmax_diff:7116 - global_seqlen/balanced_min:51331 - global_seqlen/balanced_max:51335 - global_seqlen/mean:51333.75 - actor/entropy:0.4659126400947571 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.3800289034843445 - training/rollout_probs_diff_mean:0.005810166243463755 - training/rollout_probs_diff_std:0.010853983461856842 - training/rollout_actor_probs_pearson_corr:0.9991939663887024 - rollout_corr/training_ppl:1.5440311431884766 - rollout_corr/training_log_ppl:0.4233675003051758 - rollout_corr/kl:0.0007341685122810304 - rollout_corr/k3_kl:0.0007732398808002472 - rollout_corr/rollout_ppl:1.5428745746612549 - rollout_corr/rollout_log_ppl:0.4226277470588684 - rollout_corr/log_ppl_diff:0.0007397696026600897 - rollout_corr/log_ppl_abs_diff:0.0015931494999676943 - rollout_corr/log_ppl_diff_max:0.0064428746700286865 - rollout_corr/log_ppl_diff_min:-0.007295787334442139 - rollout_corr/ppl_ratio:1.000741958618164 - rollout_corr/chi2_token:0.0016425848007202148 - rollout_corr/chi2_seq:2.2276573181152344 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.09091173713386524 - actor/kl_loss:0.01774664620461408 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.09092948585748672 - actor/grad_norm:0.533066987991333 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.98027420043945 - perf/mfu/actor:0.0 - training/global_step:26 - training/epoch:0 - critic/score/mean:0.690625011920929 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.690625011920929 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.09091173112392426 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.09091173112392426 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:558.265625 - response_length/max:1024.0 - response_length/min:142.0 - response_length/clip_ratio:0.16249999403953552 - response_length_non_aborted/mean:558.265625 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:142.0 - response_length_non_aborted/clip_ratio:0.16249999403953552 - response/aborted_ratio:0.0 - prompt_length/mean:83.40625 - prompt_length/max:140.0 - prompt_length/min:42.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.957599958288483e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.3777127960020152 - timing_s/agent_loop/generate_sequences/max:16.280698529997608 - timing_s/agent_loop/generate_sequences/mean:8.880475310412418 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002612274998682551 - timing_s/agent_loop/compute_score/max:0.012201731002278393 - timing_s/agent_loop/compute_score/mean:0.0035377835092617717 - timing_s/agent_loop/slowest/generate_sequences:16.280698529997608 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0034847070019168314 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:99 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.396798786001455 - timing_s/reward:1.701200017123483e-05 - timing_s/old_log_prob:2.74925906000135 - timing_s/ref:2.544639368999924 - timing_s/adv:0.01286118400093983 - timing_s/update_actor:8.663537633001397 - timing_s/update_weights:2.054177455996978 - timing_s/step:32.4525332000012 - timing_s/stop_profile:4.5586999476654455e-05 - timing_per_token_ms/gen:0.0917842580872762 - timing_per_token_ms/ref:0.0123926236101976 - timing_per_token_ms/update_actor:0.04219221093822971 - timing_per_token_ms/adv:6.263512796620075e-05 - perf/total_num_tokens:205335 - perf/time_per_step:32.4525332000012 - perf/throughput:1581.8102606547245
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:08:13 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:08:13 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:08:13 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:08:13 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 28/1740 [17:55<17:06:30, 35.98s/it]
�[36m(TaskRunner pid=168816)�[0m step:27 - global_seqlen/min:53502 - global_seqlen/max:61711 - global_seqlen/minmax_diff:8209 - global_seqlen/balanced_min:57619 - global_seqlen/balanced_max:57623 - global_seqlen/mean:57620.0 - actor/entropy:0.4969606101512909 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.30133116245269775 - training/rollout_probs_diff_mean:0.005936270114034414 - training/rollout_probs_diff_std:0.010639668442308903 - training/rollout_actor_probs_pearson_corr:0.9992275834083557 - rollout_corr/training_ppl:1.598163366317749 - rollout_corr/training_log_ppl:0.45673665404319763 - rollout_corr/kl:0.0009203918161801994 - rollout_corr/k3_kl:0.0007837150478735566 - rollout_corr/rollout_ppl:1.5966589450836182 - rollout_corr/rollout_log_ppl:0.4557948112487793 - rollout_corr/log_ppl_diff:0.0009418734116479754 - rollout_corr/log_ppl_abs_diff:0.001470401999540627 - rollout_corr/log_ppl_diff_max:0.006966650485992432 - rollout_corr/log_ppl_diff_min:-0.003580033779144287 - rollout_corr/ppl_ratio:1.000943660736084 - rollout_corr/chi2_token:0.0012996196746826172 - rollout_corr/chi2_seq:0.36657142639160156 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.06840596234542318 - actor/kl_loss:0.01661144337413134 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.06842257082462311 - actor/grad_norm:0.4610898196697235 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.9737091064453 - perf/mfu/actor:0.0 - training/global_step:27 - training/epoch:0 - critic/score/mean:0.684374988079071 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.684374988079071 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.06840596348047256 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.06840596348047256 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:633.671875 - response_length/max:1024.0 - response_length/min:132.0 - response_length/clip_ratio:0.23125000298023224 - response_length_non_aborted/mean:633.671875 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:132.0 - response_length_non_aborted/clip_ratio:0.23125000298023224 - response/aborted_ratio:0.0 - prompt_length/mean:86.578125 - prompt_length/max:164.0 - prompt_length/min:46.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.289999899105169e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.1367637389994343 - timing_s/agent_loop/generate_sequences/max:16.018625526001415 - timing_s/agent_loop/generate_sequences/mean:9.903587546393794 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002722542998526478 - timing_s/agent_loop/compute_score/max:0.015013208998425398 - timing_s/agent_loop/compute_score/mean:0.0039391223220036405 - timing_s/agent_loop/slowest/generate_sequences:16.01480215599804 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.007768907002173364 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:105 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.135137560002477 - timing_s/reward:1.931899896590039e-05 - timing_s/old_log_prob:2.78938519300209 - timing_s/ref:3.611270843000966 - timing_s/adv:0.015836581998883048 - timing_s/update_actor:8.96536670400019 - timing_s/update_weights:1.9840058329973544 - timing_s/step:33.52904716899866 - timing_s/stop_profile:3.8747999496990815e-05 - timing_per_token_ms/gen:0.07957163141414117 - timing_per_token_ms/ref:0.015668478145613354 - timing_per_token_ms/update_actor:0.03889867539049023 - timing_per_token_ms/adv:6.871130683305731e-05 - perf/total_num_tokens:230480 - perf/time_per_step:33.52904716899866 - perf/throughput:1718.5099149872685
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:08:45 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:08:45 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:08:45 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:08:45 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 29/1740 [18:27<16:31:05, 34.75s/it]
�[36m(TaskRunner pid=168816)�[0m step:28 - global_seqlen/min:45306 - global_seqlen/max:57795 - global_seqlen/minmax_diff:12489 - global_seqlen/balanced_min:52308 - global_seqlen/balanced_max:52309 - global_seqlen/mean:52308.5 - actor/entropy:0.4769671559333801 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.4658483862876892 - training/rollout_probs_diff_mean:0.00587458536028862 - training/rollout_probs_diff_std:0.010763218626379967 - training/rollout_actor_probs_pearson_corr:0.9992130994796753 - rollout_corr/training_ppl:1.5574144124984741 - rollout_corr/training_log_ppl:0.4310193657875061 - rollout_corr/kl:0.0008154725073836744 - rollout_corr/k3_kl:0.0007719529094174504 - rollout_corr/rollout_ppl:1.5560500621795654 - rollout_corr/rollout_log_ppl:0.43015509843826294 - rollout_corr/log_ppl_diff:0.0008642712491564453 - rollout_corr/log_ppl_abs_diff:0.001552100176922977 - rollout_corr/log_ppl_diff_max:0.008824735879898071 - rollout_corr/log_ppl_diff_min:-0.005368053913116455 - rollout_corr/ppl_ratio:1.0008662939071655 - rollout_corr/chi2_token:0.0014683008193969727 - rollout_corr/chi2_seq:1.8509140014648438 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.05887504993006587 - actor/kl_loss:0.01964174729801016 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.05889469385147095 - actor/grad_norm:0.41483402252197266 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.92419815063477 - perf/mfu/actor:0.0 - training/global_step:28 - training/epoch:0 - critic/score/mean:0.675000011920929 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.675000011920929 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.05887504667043686 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.05887504667043686 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:569.9656372070312 - response_length/max:1024.0 - response_length/min:154.0 - response_length/clip_ratio:0.19374999403953552 - response_length_non_aborted/mean:569.9656372070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:154.0 - response_length_non_aborted/clip_ratio:0.19374999403953552 - response/aborted_ratio:0.0 - prompt_length/mean:83.890625 - prompt_length/max:178.0 - prompt_length/min:42.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.174199759494513e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.5478495309980644 - timing_s/agent_loop/generate_sequences/max:16.349100317002012 - timing_s/agent_loop/generate_sequences/mean:8.961940980124973 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002679038003407186 - timing_s/agent_loop/compute_score/max:0.012988074999157107 - timing_s/agent_loop/compute_score/mean:0.003633487625006637 - timing_s/agent_loop/slowest/generate_sequences:16.349100317002012 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.004687337001087144 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:178 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.462789862001955 - timing_s/reward:1.4299999747890979e-05 - timing_s/old_log_prob:2.7384528789989417 - timing_s/ref:2.5542796379995707 - timing_s/adv:0.01294942399908905 - timing_s/update_actor:8.76329436600281 - timing_s/update_weights:1.9849439350000466 - timing_s/step:32.546794959998806 - timing_s/stop_profile:3.554399881977588e-05 - timing_per_token_ms/gen:0.09026196679625391 - timing_per_token_ms/ref:0.012207765649940118 - timing_per_token_ms/update_actor:0.04188274547159071 - timing_per_token_ms/adv:6.18896737580367e-05 - perf/total_num_tokens:209234 - perf/time_per_step:32.546794959998806 - perf/throughput:1607.1782202914003
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:09:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:09:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:09:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:09:17 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 30/1740 [19:24<19:39:46, 41.40s/it]
�[36m(TaskRunner pid=168816)�[0m step:29 - global_seqlen/min:47236 - global_seqlen/max:61149 - global_seqlen/minmax_diff:13913 - global_seqlen/balanced_min:52422 - global_seqlen/balanced_max:52423 - global_seqlen/mean:52422.75 - actor/entropy:0.46949806809425354 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.2135259509086609 - training/rollout_probs_diff_mean:0.005826863460242748 - training/rollout_probs_diff_std:0.010664447210729122 - training/rollout_actor_probs_pearson_corr:0.9992198944091797 - rollout_corr/training_ppl:1.5460187196731567 - rollout_corr/training_log_ppl:0.42518606781959534 - rollout_corr/kl:0.0008716813754290342 - rollout_corr/k3_kl:0.0007588762673549354 - rollout_corr/rollout_ppl:1.5446584224700928 - rollout_corr/rollout_log_ppl:0.4243393838405609 - rollout_corr/log_ppl_diff:0.0008466839790344238 - rollout_corr/log_ppl_abs_diff:0.001493286108598113 - rollout_corr/log_ppl_diff_max:0.0062032341957092285 - rollout_corr/log_ppl_diff_min:-0.003215193748474121 - rollout_corr/ppl_ratio:1.0008485317230225 - rollout_corr/chi2_token:0.00130462646484375 - rollout_corr/chi2_seq:0.272318959236145 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.062392897976678796 - actor/kl_loss:0.021421577912406065 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.062414318323135376 - actor/grad_norm:0.4430452287197113 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.93898391723633 - perf/mfu/actor:0.0 - training/global_step:29 - training/epoch:0 - critic/score/mean:0.690625011920929 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.690625011920929 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.06239289790391922 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.06239289790391922 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:572.2374877929688 - response_length/max:1024.0 - response_length/min:215.0 - response_length/clip_ratio:0.20000000298023224 - response_length_non_aborted/mean:572.2374877929688 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:215.0 - response_length_non_aborted/clip_ratio:0.20000000298023224 - response/aborted_ratio:0.0 - prompt_length/mean:83.046875 - prompt_length/max:145.0 - prompt_length/min:53.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.664099833462387e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:3.4036702439989313 - timing_s/agent_loop/generate_sequences/max:15.788307540999085 - timing_s/agent_loop/generate_sequences/mean:8.88038491969063 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002599923998786835 - timing_s/agent_loop/compute_score/max:0.010152423001272837 - timing_s/agent_loop/compute_score/mean:0.0036312560093506364 - timing_s/agent_loop/slowest/generate_sequences:15.788307540999085 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0036500669993984047 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:104 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:15.90046066099967 - timing_s/reward:2.0736999431392178e-05 - timing_s/old_log_prob:2.726835571000265 - timing_s/ref:2.5641987129965855 - timing_s/adv:0.012717495003016666 - timing_s/update_actor:8.711745931999758 - timing_s/update_weights:1.9528152080019936 - timing_s/step:31.89986537599907 - timing_s/stop_profile:4.983399776392616e-05 - timing_per_token_ms/gen:0.08683272166823036 - timing_per_token_ms/ref:0.012228463372279142 - timing_per_token_ms/update_actor:0.04154563587373687 - timing_per_token_ms/adv:6.064874030366905e-05 - perf/total_num_tokens:209691 - perf/time_per_step:31.89986537599907 - perf/throughput:1643.353330244522
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:09:49 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:09:49 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:09:49 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:09:49 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(TaskRunner pid=168816)�[0m test_gen_batch meta info: {'eos_token_id': 151645, 'pad_token_id': 151643, 'recompute_log_prob': False, 'do_sample': False, 'validate': True, 'global_steps': 30}
�[36m(TaskRunner pid=168816)�[0m validation generation end
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 31/1740 [19:57<18:21:46, 38.68s/it]
�[36m(TaskRunner pid=168816)�[0m step:30 - global_seqlen/min:46852 - global_seqlen/max:52183 - global_seqlen/minmax_diff:5331 - global_seqlen/balanced_min:49703 - global_seqlen/balanced_max:49706 - global_seqlen/mean:49704.75 - actor/entropy:0.469163179397583 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.21414899826049805 - training/rollout_probs_diff_mean:0.005816871300339699 - training/rollout_probs_diff_std:0.010764050297439098 - training/rollout_actor_probs_pearson_corr:0.9992043972015381 - rollout_corr/training_ppl:1.5423425436019897 - rollout_corr/training_log_ppl:0.4217142164707184 - rollout_corr/kl:0.0008268850506283343 - rollout_corr/k3_kl:0.0007825488573871553 - rollout_corr/rollout_ppl:1.5410208702087402 - rollout_corr/rollout_log_ppl:0.42086490988731384 - rollout_corr/log_ppl_diff:0.0008492710185237229 - rollout_corr/log_ppl_abs_diff:0.0016384271439164877 - rollout_corr/log_ppl_diff_max:0.006505802273750305 - rollout_corr/log_ppl_diff_min:-0.005738705396652222 - rollout_corr/ppl_ratio:1.0008513927459717 - rollout_corr/chi2_token:0.0014990568161010742 - rollout_corr/chi2_seq:1.0719878673553467 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.05567136717218091 - actor/kl_loss:0.022307425562758 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.055693671107292175 - actor/grad_norm:0.41337600350379944 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:245.9753074645996 - perf/mfu/actor:0.0 - val-aux/openai/gsm8k/reward/mean@1:0.6777862016679302 - val-core/openai/gsm8k/acc/mean@1:0.6777862016679302 - val-aux/num_turns/min:2 - val-aux/num_turns/max:2 - val-aux/num_turns/mean:2.0 - training/global_step:30 - training/epoch:0 - critic/score/mean:0.699999988079071 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.699999988079071 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.05567138269543648 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.05567138269543648 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:539.5281372070312 - response_length/max:1024.0 - response_length/min:122.0 - response_length/clip_ratio:0.15937499701976776 - response_length_non_aborted/mean:539.5281372070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:122.0 - response_length_non_aborted/clip_ratio:0.15937499701976776 - response/aborted_ratio:0.0 - prompt_length/mean:81.78125 - prompt_length/max:147.0 - prompt_length/min:50.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.276699812384322e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:1.9796044850008911 - timing_s/agent_loop/generate_sequences/max:16.28821057800087 - timing_s/agent_loop/generate_sequences/mean:8.470853306428136 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002634427000884898 - timing_s/agent_loop/compute_score/max:0.01324115600073128 - timing_s/agent_loop/compute_score/mean:0.0037254017561963336 - timing_s/agent_loop/slowest/generate_sequences:16.280659372001537 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.01324115600073128 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:128 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.397280042998318 - timing_s/reward:1.4389999705599621e-05 - timing_s/old_log_prob:2.6791540550002537 - timing_s/ref:2.5160814610026137 - timing_s/adv:0.012741031001496594 - timing_s/update_actor:8.68666180099899 - timing_s/update_weights:2.022027674000128 - timing_s/step:32.341808184999536 - timing_s/testing:24.542288245000236 - timing_s/stop_profile:9.043300087796524e-05 - timing_per_token_ms/gen:0.0949746598184659 - timing_per_token_ms/ref:0.012655135882398633 - timing_per_token_ms/update_actor:0.0436913061679165 - timing_per_token_ms/adv:6.408356847935356e-05 - perf/total_num_tokens:198819 - perf/time_per_step:32.341808184999536 - perf/throughput:1536.857485384926
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:10:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:10:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:10:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:10:46 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 32/1740 [20:30<17:33:53, 37.02s/it]
�[36m(TaskRunner pid=168816)�[0m step:31 - global_seqlen/min:47819 - global_seqlen/max:60067 - global_seqlen/minmax_diff:12248 - global_seqlen/balanced_min:53784 - global_seqlen/balanced_max:53787 - global_seqlen/mean:53786.0 - actor/entropy:0.4709254503250122 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.24545535445213318 - training/rollout_probs_diff_mean:0.0057618506252765656 - training/rollout_probs_diff_std:0.01066435407847166 - training/rollout_actor_probs_pearson_corr:0.9992364048957825 - rollout_corr/training_ppl:1.5652716159820557 - rollout_corr/training_log_ppl:0.4350922107696533 - rollout_corr/kl:0.0007489005802199244 - rollout_corr/k3_kl:0.0007612535264343023 - rollout_corr/rollout_ppl:1.564142107963562 - rollout_corr/rollout_log_ppl:0.4343784749507904 - rollout_corr/log_ppl_diff:0.0007136977510526776 - rollout_corr/log_ppl_abs_diff:0.0014298844616860151 - rollout_corr/log_ppl_diff_max:0.007002353668212891 - rollout_corr/log_ppl_diff_min:-0.00696951150894165 - rollout_corr/ppl_ratio:1.0007154941558838 - rollout_corr/chi2_token:0.0015562772750854492 - rollout_corr/chi2_seq:1.97922945022583 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.05996635038172826 - actor/kl_loss:0.023327639850322157 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.05998967960476875 - actor/grad_norm:0.42521512508392334 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:246.1760597229004 - perf/mfu/actor:0.0 - training/global_step:31 - training/epoch:0 - critic/score/mean:0.65625 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.65625 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.05996635556221008 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.05996635556221008 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:593.2781372070312 - response_length/max:1024.0 - response_length/min:168.0 - response_length/clip_ratio:0.19062499701976776 - response_length_non_aborted/mean:593.2781372070312 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:168.0 - response_length_non_aborted/clip_ratio:0.19062499701976776 - response/aborted_ratio:0.0 - prompt_length/mean:79.046875 - prompt_length/max:146.0 - prompt_length/min:46.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.678699770011008e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.70103316499808 - timing_s/agent_loop/generate_sequences/max:16.015834177000215 - timing_s/agent_loop/generate_sequences/mean:9.282684985709375 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.002722618999541737 - timing_s/agent_loop/compute_score/max:0.012344935999863083 - timing_s/agent_loop/compute_score/mean:0.0036251944781724886 - timing_s/agent_loop/slowest/generate_sequences:16.015834177000215 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0031003139993117657 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:93 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.138691787000425 - timing_s/reward:2.2452000848716125e-05 - timing_s/old_log_prob:2.7982891539977572 - timing_s/ref:2.571457394002209 - timing_s/adv:0.012986169000214431 - timing_s/update_actor:8.78972760400211 - timing_s/update_weights:1.990655223999056 - timing_s/step:32.34322520799833 - timing_s/stop_profile:4.506199911702424e-05 - timing_per_token_ms/gen:0.0850080421124179 - timing_per_token_ms/ref:0.011952261713095458 - timing_per_token_ms/update_actor:0.04085509056260974 - timing_per_token_ms/adv:6.036035864450987e-05 - perf/total_num_tokens:215144 - perf/time_per_step:32.34322520799833 - perf/throughput:1662.9757748061245
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:11:19 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:11:19 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:11:19 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:11:19 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 33/1740 [21:01<16:45:39, 35.35s/it]
�[36m(TaskRunner pid=168816)�[0m step:32 - global_seqlen/min:47951 - global_seqlen/max:56778 - global_seqlen/minmax_diff:8827 - global_seqlen/balanced_min:52181 - global_seqlen/balanced_max:52185 - global_seqlen/mean:52182.5 - actor/entropy:0.46357816457748413 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.24424585700035095 - training/rollout_probs_diff_mean:0.005731112323701382 - training/rollout_probs_diff_std:0.010689939372241497 - training/rollout_actor_probs_pearson_corr:0.9992147088050842 - rollout_corr/training_ppl:1.5434741973876953 - rollout_corr/training_log_ppl:0.42330366373062134 - rollout_corr/kl:0.000799682573415339 - rollout_corr/k3_kl:0.0007637348608113825 - rollout_corr/rollout_ppl:1.5421963930130005 - rollout_corr/rollout_log_ppl:0.4224731922149658 - rollout_corr/log_ppl_diff:0.0008304804796352983 - rollout_corr/log_ppl_abs_diff:0.0015954530099406838 - rollout_corr/log_ppl_diff_max:0.006294339895248413 - rollout_corr/log_ppl_diff_min:-0.004976063966751099 - rollout_corr/ppl_ratio:1.0008325576782227 - rollout_corr/chi2_token:0.0014684200286865234 - rollout_corr/chi2_seq:1.480886459350586 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.07276782239205204 - actor/kl_loss:0.02486395652522333 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.07279268652200699 - actor/grad_norm:0.4402347505092621 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:246.13925170898438 - perf/mfu/actor:0.0 - training/global_step:32 - training/epoch:0 - critic/score/mean:0.671875 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.671875 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.07276783138513565 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.07276783138513565 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:568.296875 - response_length/max:1024.0 - response_length/min:165.0 - response_length/clip_ratio:0.19062499701976776 - response_length_non_aborted/mean:568.296875 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:165.0 - response_length_non_aborted/clip_ratio:0.19062499701976776 - response/aborted_ratio:0.0 - prompt_length/mean:83.984375 - prompt_length/max:139.0 - prompt_length/min:50.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:4.965900006936863e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.646087835000799 - timing_s/agent_loop/generate_sequences/max:16.027627368999674 - timing_s/agent_loop/generate_sequences/mean:8.871634894956276 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0026671160012483597 - timing_s/agent_loop/compute_score/max:0.009016593998239841 - timing_s/agent_loop/compute_score/mean:0.0036179206813471863 - timing_s/agent_loop/slowest/generate_sequences:16.027627368999674 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.005302262001350755 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:101 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:16.13418066399754 - timing_s/reward:2.0714000129373744e-05 - timing_s/old_log_prob:3.6673050979989057 - timing_s/ref:2.557326738999109 - timing_s/adv:0.014888720997987548 - timing_s/update_actor:8.721506083002168 - timing_s/update_weights:2.021190506999119 - timing_s/step:33.14506668299873 - timing_s/stop_profile:3.649599966593087e-05 - timing_per_token_ms/gen:0.08872002784634758 - timing_per_token_ms/ref:0.012251840842232112 - timing_per_token_ms/update_actor:0.041783673084856844 - timing_per_token_ms/adv:7.133004837822808e-05 - perf/total_num_tokens:208730 - perf/time_per_step:33.14506668299873 - perf/throughput:1574.3670241811956
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:11:51 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:11:51 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:11:51 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:11:51 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(TaskRunner pid=168816)�[0m 
Training Progress:   2%|▏         | 34/1740 [21:33<16:16:58, 34.36s/it]
�[36m(TaskRunner pid=168816)�[0m step:33 - global_seqlen/min:41787 - global_seqlen/max:55060 - global_seqlen/minmax_diff:13273 - global_seqlen/balanced_min:46950 - global_seqlen/balanced_max:46954 - global_seqlen/mean:46952.5 - actor/entropy:0.4453568756580353 - perf/mfu/actor_infer:0.0 - training/rollout_probs_diff_valid:1 - training/rollout_probs_diff_max:0.22922007739543915 - training/rollout_probs_diff_mean:0.005860774777829647 - training/rollout_probs_diff_std:0.010971582494676113 - training/rollout_actor_probs_pearson_corr:0.9991459250450134 - rollout_corr/training_ppl:1.5131886005401611 - rollout_corr/training_log_ppl:0.4064783453941345 - rollout_corr/kl:0.0008526794845238328 - rollout_corr/k3_kl:0.0007880449411459267 - rollout_corr/rollout_ppl:1.5119125843048096 - rollout_corr/rollout_log_ppl:0.40563035011291504 - rollout_corr/log_ppl_diff:0.0008480483666062355 - rollout_corr/log_ppl_abs_diff:0.0015744874253869057 - rollout_corr/log_ppl_diff_max:0.007964730262756348 - rollout_corr/log_ppl_diff_min:-0.004110515117645264 - rollout_corr/ppl_ratio:1.000849962234497 - rollout_corr/chi2_token:0.0014585256576538086 - rollout_corr/chi2_seq:0.677154541015625 - actor/pg_clipfrac:0.0 - actor/ppo_kl:0.0 - actor/pg_clipfrac_lower:0.0 - actor/pg_loss:0.04408613292071095 - actor/kl_loss:0.031885975564364344 - actor/kl_coef:0.0010000000000000002 - actor/loss:0.044118016958236694 - actor/grad_norm:0.4416860044002533 - actor/lr:1e-06 - actor/perf/max_memory_allocated_gb:11.610264778137207 - actor/perf/max_memory_reserved_gb:64.8828125 - actor/perf/cpu_memory_used_gb:246.0670166015625 - perf/mfu/actor:0.0 - training/global_step:33 - training/epoch:0 - critic/score/mean:0.699999988079071 - critic/score/max:1.0 - critic/score/min:0.0 - critic/rewards/mean:0.699999988079071 - critic/rewards/max:1.0 - critic/rewards/min:0.0 - critic/advantages/mean:-0.04408613592386246 - critic/advantages/max:1.7888504266738892 - critic/advantages/min:-1.7888504266738892 - critic/returns/mean:-0.04408613592386246 - critic/returns/max:1.7888504266738892 - critic/returns/min:-1.7888504266738892 - response_length/mean:507.421875 - response_length/max:1024.0 - response_length/min:167.0 - response_length/clip_ratio:0.10625000298023224 - response_length_non_aborted/mean:507.421875 - response_length_non_aborted/max:1024.0 - response_length_non_aborted/min:167.0 - response_length_non_aborted/clip_ratio:0.10625000298023224 - response/aborted_ratio:0.0 - prompt_length/mean:79.484375 - prompt_length/max:121.0 - prompt_length/min:45.0 - prompt_length/clip_ratio:0.0 - num_turns/min:2 - num_turns/max:2 - num_turns/mean:2.0 - timing_s/start_profile:5.1751001592492685e-05 - timing_s/agent_loop/num_preempted/min:-1 - timing_s/agent_loop/num_preempted/max:-1 - timing_s/agent_loop/num_preempted/mean:-1.0 - timing_s/agent_loop/generate_sequences/min:2.6494803189998493 - timing_s/agent_loop/generate_sequences/max:15.667131416001212 - timing_s/agent_loop/generate_sequences/mean:7.886873671431237 - timing_s/agent_loop/tool_calls/min:0.0 - timing_s/agent_loop/tool_calls/max:0.0 - timing_s/agent_loop/tool_calls/mean:0.0 - timing_s/agent_loop/compute_score/min:0.0025968180016207043 - timing_s/agent_loop/compute_score/max:0.006390867001755396 - timing_s/agent_loop/compute_score/mean:0.0034748475064134256 - timing_s/agent_loop/slowest/generate_sequences:15.667131416001212 - timing_s/agent_loop/slowest/tool_calls:0.0 - timing_s/agent_loop/slowest/compute_score:0.0030980660012573935 - timing_s/agent_loop/slowest/num_preempted:-1 - timing_s/agent_loop/slowest/prompt_length:97 - timing_s/agent_loop/slowest/response_length:1024 - timing_s/gen:15.77865515600206 - timing_s/reward:1.5855999663472176e-05 - timing_s/old_log_prob:2.6667424549996213 - timing_s/ref:2.511228076000407 - timing_s/adv:0.012652785997488536 - timing_s/update_actor:8.490660668998316 - timing_s/update_weights:1.9391714880002837 - timing_s/step:31.427358791999723 - timing_s/stop_profile:4.928099951939657e-05 - timing_per_token_ms/gen:0.09717416570286103 - timing_per_token_ms/ref:0.013371109504288414 - timing_per_token_ms/update_actor:0.04520877838772332 - timing_per_token_ms/adv:6.737014002176953e-05 - perf/total_num_tokens:187810 - perf/time_per_step:31.427358791999723 - perf/throughput:1494.0008261830906
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(EngineCore_DP0 pid=174294)�[0;0m WARNING 06-23 09:12:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(EngineCore_DP0 pid=174158)�[0;0m WARNING 06-23 09:12:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(EngineCore_DP0 pid=174151)�[0;0m WARNING 06-23 09:12:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(EngineCore_DP0 pid=174141)�[0;0m WARNING 06-23 09:12:23 [scheduler.py:1568] reset_connector called but no KV connector is configured.
�[36m(vLLMHttpServer pid=172375)�[0m �[0;36m(Worker pid=174561)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172376)�[0m �[0;36m(Worker pid=174416)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172377)�[0m �[0;36m(Worker pid=174406)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0
�[36m(vLLMHttpServer pid=172378)�[0m �[0;36m(Worker pid=174391)�[0;0m INFO:/usr/local/lib/python3.12/dist-packages/verl/workers/rollout/vllm_rollout/utils.py:Loading standard weights (non-FP8, async), loaded_params: 311, loaded_buffers: 0

gongxijun and others added 3 commits June 23, 2026 17:18
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@@ -0,0 +1,55 @@
# Enflame GCU User Guide

Last updated: 06/22/2026.

@heavyrain-lzy heavyrain-lzy Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User guide is too simple. You can refer to #3. Ensure that users can start the training according to the instructions

logger.debug("MetaX Megatron engines not registered: %s", e)

# Enflame GCU engines (ECCL/FlagCX communication)
ensure_enflame_engines_registered()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not required.

| PyTorch API | `torch.gcu` (via `torch_gcu`) |
| Communication backend | `eccl` (default) or `flagcx` (when `USE_FLAGCX=1`) |
| Device visibility env var | `TOPS_VISIBLE_DEVICES` |
| Ray resource name | `GPU` (built-in) |

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added end-to-end validation coverage in #5, including E2E checks. Please follow the scripts: https://github.com/verl-project/verl-hardware-plugin/blob/main/scripts/baseline_grpo_gsm8k.sh and compare the result in the https://swanlab.cn/@heavyrain/verl_grpo_gsm8k_math/runs/8h196r8o/chart

Comment thread verl_hardware_plugin/platforms/platform_enflame.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants