Skip to content

Why there is a quote #87

Description

@LukeLIN-web

--gradient_merge_steps $(expr 67584 \/ $batch_size \/ 8)"

there has a quote without end
I modify it as following:

$CMD        --max_predictions_per_seq 80 \
            --learning_rate 5e-5 \
            --weight_decay 0.0 \
            --adam_epsilon 1e-8 \
            --warmup_steps 0 \
            --output_dir ./tmp2/ \
            --logging_steps 10 \
            --save_steps 20000 \
            --input_dir=$DATA_DIR \
            --model_type bert \
            --model_name_or_path bert-base-uncased \
            --batch_size ${batch_size} \
            --use_amp ${use_amp} \
            --gradient_merge_steps $(expr 67584 \/ $batch_size \/ 8)

And it show another problem :
Traceback (most recent call last):
File "./run_pretrain.py", line 439, in
do_train(args)
File "./run_pretrain.py", line 316, in do_train
train_data_loader) * args.num_train_epochs
UnboundLocalError: local variable 'train_data_loader' referenced before assignment

I used https://github.com/PaddlePaddle/Perf/blob/master/Bert/scripts/paddle_base_pre_training.sh
This shell script worked.

what more , I wonder how get 八卡的训练吞吐率(sequences/sec)?
是把八个worklog 都加起来吗? 有没有快速加起来的方法?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions