Inquiry Regarding CIDEr Scores on Flickr30k Evaluation

<img width="573" alt="Image" src="https://github.com/user-attachments/assets/5ffe8b72-3ef0-4596-aed0-b0891d4716d3" />

<img width="541" alt="Image" src="https://github.com/user-attachments/assets/6ac0de50-e154-41bf-bd2a-82b419e7131f" />

<img width="497" alt="Image" src="https://github.com/user-attachments/assets/4a05b5fc-1b9d-4941-ad23-c113ae747b49" />

I sincerely appreciate your outstanding work! I recently employed the adversarially fine-tuned ViT-L/14 CLIP models ($FARE^4$) that you provided as the vision_encoder_pretrained model and conducted an evaluation (the attack is apgd) on the Flickr30k dataset using llava_eval.sh.

However, I noticed that the reported CIDEr score differs significantly from the results presented in Table 1. This discrepancy has left me somewhat puzzled, and I would greatly appreciate any insights you could provide regarding potential factors that might contribute to this variation.

Looking forward to your response. Thank you for your time and assistance! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inquiry Regarding CIDEr Scores on Flickr30k Evaluation #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Inquiry Regarding CIDEr Scores on Flickr30k Evaluation #15

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions