question about the max seq length

# 🖥 Benchmarking `transformers`
Hi there,

When I run one of the examples in the text classification folder, and pass max_seq_length =1024 to the model, I got the following warning, which says:  WARNING - __main__ -   The max_seq_length passed (1024) is larger than the maximum length for the model (512). Using max_seq_length=512.

## Set-up
I'm runing on GPU node with the following command.
python ./examples/text-classification/run_glue.py \
  --model_name_or_path bert-base-cased \
  --task_name mrpc \
  --do_train \
  --do_eval \
  --max_seq_length 1024 \
  --per_device_train_batch_size 8 \
  --learning_rate 2e-5 \
  --num_train_epochs 1 \
  --overwrite_output_dir \
  --output_dir /tmp/mrpc/

It can still give me a output. But instead of using the max_seq_length as 1024, it uses max_seq_length=512.

I'm wondering if this is due to the model is still limited to the 512 max token length in memory requirement like most transformer and bert-based models. Or is this caused by the default configuration in the pre-training process? And in the paper, the author mentioned two settings and one of them is 1024, so how can I get the pretained model with max_seq_length=1024? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about the max seq length #8

🖥 Benchmarking `transformers`

Set-up

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

question about the max seq length #8

Description

🖥 Benchmarking transformers

Set-up

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

🖥 Benchmarking `transformers`