I was unable to achieve the result shown in the UDOP paper.
I used the udop-unimodel-large-224 checkpoint.
My ANLS score is 0.407903.
This is nowhere near 0.461 as shown in the table below taken from the paper.

Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings.
Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from #71 (comment)
Results of the 3 different finetuning configurations:
| Task prefix |
Hyperparameter settings |
ANLS Score |
| No |
Unchanged finetuning script |
0.407903 |
| No |
Paper's settings |
0.40174 |
| Yes |
Unchanged finetuning script |
0.408355 |
Other changes I made:
Please assist
I was unable to achieve the result shown in the UDOP paper.
I used the udop-unimodel-large-224 checkpoint.
My ANLS score is 0.407903.
This is nowhere near 0.461 as shown in the table below taken from the paper.
Since I noticed that the batch size, warmup steps and weight decay given in https://github.com/microsoft/i-Code/blob/main/i-Code-Doc/scripts/finetune_duebenchmark.sh is different from reported in the paper, I also tried changing the finetuning script to use the paper's settings.
Lastly, I also tried adding the task prompt prefix since it is not done so in the existing code. I followed approach from #71 (comment)
Results of the 3 different finetuning configurations:
Other changes I made:
Change to use pytorch's AdamW, based from loss does not have a grad fn #63 (comment)
Within
baselines-masterindue-benchmarkrepo:baselines-master/benchmarker/data/utils.py, I changeddtypeof label_name fromU100toU1024to prevent truncation of questions during displayPlease assist