Hi,
Thanks for uploading CORA on github! I am trying to use your package in my project, and wanted to make sure if it was the intention of the authors to add the language_id twice to the outputs of the mDPR before passing it to mGEN.
In mDPR/dense_retriever.py, in the method parse_qa_jsonlines_file, the 2 - letter language id is added to the question while encoding the question for mDPR. Considering this was intentional, the 2 letter language id is added again while converting mDPR outputs to seq2seq (here)
What ends up happening is that before the input sequence is sent into mGEN, the question ID is appended in the end by the language id twice, both being the same. We would follow the same format if the authors intended it to be so.
Hi,
Thanks for uploading CORA on github! I am trying to use your package in my project, and wanted to make sure if it was the intention of the authors to add the
language_idtwice to the outputs of themDPRbefore passing it tomGEN.In mDPR/dense_retriever.py, in the method
parse_qa_jsonlines_file, the 2 - letter language id is added to the question while encoding the question formDPR. Considering this was intentional, the 2 letter language id is added again while converting mDPR outputs to seq2seq (here)What ends up happening is that before the input sequence is sent into
mGEN, the question ID is appended in the end by the language id twice, both being the same. We would follow the same format if the authors intended it to be so.