(cropa) root@autodl-container-00ff4cb9d9-efc66e73:~/autodl-tmp/Revisting_CroPA-main# python main.py
use specified gpu 0
model_name is: blip2
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.70it/s]
blip2
target_token_len is: 2
0%| | 0/22 [00:00<?, ?it/s]item is: {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x426 at 0x7FD9BFAFB5E0>, 'question': 'What is in front of the giraffes?', 'answers': ['tree', 'tree', 'tree', 'tree', 'tree', 'tree', 'tree', 'tree', 'trees', 'trees'], 'question_id': 25000}
total_ques_list size is: 10
item_images is: [[<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x426 at 0x7FD9BFAFB5E0>]]
test_item_images is: [[<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x426 at 0x7FD9BFAFB5E0>]]
/root/autodl-tmp/Revisting_CroPA-main/main.py:310: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
text_perturb = torch.tensor(perturb_list[text_idx],requires_grad=True,device=device)
0%| | 0/22 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/autodl-tmp/Revisting_CroPA-main/main.py", line 568, in
attack(
File "/root/autodl-tmp/Revisting_CroPA-main/main.py", line 329, in attack
output = eval_model.model(
File "/root/miniconda3/envs/cropa/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/cropa/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/cropa/lib/python3.9/site-packages/transformers/models/blip_2/modeling_blip_2.py", line 2124, in forward
outputs = self.language_model(
TypeError: OPTForCausalLM(
(model): OPTModel(
(decoder): OPTDecoder(
(embed_tokens): Embedding(50304, 2560, padding_idx=1)
(embed_positions): OPTLearnedPositionalEmbedding(2050, 2560)
(final_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(layers): ModuleList(
(0-31): 32 x OPTDecoderLayer(
(self_attn): OPTAttention(
(k_proj): Linear(in_features=2560, out_features=2560, bias=True)
(v_proj): Linear(in_features=2560, out_features=2560, bias=True)
(q_proj): Linear(in_features=2560, out_features=2560, bias=True)
(out_proj): Linear(in_features=2560, out_features=2560, bias=True)
)
(activation_fn): ReLU()
(self_attn_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=2560, out_features=10240, bias=True)
(fc2): Linear(in_features=10240, out_features=2560, bias=True)
(final_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
)
)
)
)
(lm_head): Linear(in_features=2560, out_features=50304, bias=False)
) got multiple values for keyword argument 'inputs_embeds'
(cropa) root@autodl-container-00ff4cb9d9-efc66e73:~/autodl-tmp/Revisting_CroPA-main# python main.py
use specified gpu 0
model_name is: blip2
Using a slow image processor as
use_fastis unset and a slow processor was saved with this model.use_fast=Truewill be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor withuse_fast=False.Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 2.70it/s]
blip2
target_token_len is: 2
0%| | 0/22 [00:00<?, ?it/s]item is: {'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x426 at 0x7FD9BFAFB5E0>, 'question': 'What is in front of the giraffes?', 'answers': ['tree', 'tree', 'tree', 'tree', 'tree', 'tree', 'tree', 'tree', 'trees', 'trees'], 'question_id': 25000}
total_ques_list size is: 10
item_images is: [[<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x426 at 0x7FD9BFAFB5E0>]]
test_item_images is: [[<PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=640x426 at 0x7FD9BFAFB5E0>]]
/root/autodl-tmp/Revisting_CroPA-main/main.py:310: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
text_perturb = torch.tensor(perturb_list[text_idx],requires_grad=True,device=device)
0%| | 0/22 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/autodl-tmp/Revisting_CroPA-main/main.py", line 568, in
attack(
File "/root/autodl-tmp/Revisting_CroPA-main/main.py", line 329, in attack
output = eval_model.model(
File "/root/miniconda3/envs/cropa/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/cropa/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/cropa/lib/python3.9/site-packages/transformers/models/blip_2/modeling_blip_2.py", line 2124, in forward
outputs = self.language_model(
TypeError: OPTForCausalLM(
(model): OPTModel(
(decoder): OPTDecoder(
(embed_tokens): Embedding(50304, 2560, padding_idx=1)
(embed_positions): OPTLearnedPositionalEmbedding(2050, 2560)
(final_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(layers): ModuleList(
(0-31): 32 x OPTDecoderLayer(
(self_attn): OPTAttention(
(k_proj): Linear(in_features=2560, out_features=2560, bias=True)
(v_proj): Linear(in_features=2560, out_features=2560, bias=True)
(q_proj): Linear(in_features=2560, out_features=2560, bias=True)
(out_proj): Linear(in_features=2560, out_features=2560, bias=True)
)
(activation_fn): ReLU()
(self_attn_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=2560, out_features=10240, bias=True)
(fc2): Linear(in_features=10240, out_features=2560, bias=True)
(final_layer_norm): LayerNorm((2560,), eps=1e-05, elementwise_affine=True)
)
)
)
)
(lm_head): Linear(in_features=2560, out_features=50304, bias=False)
) got multiple values for keyword argument 'inputs_embeds'