Adding the language id twice to the question before passing it to mGEN

Hi, 

Thanks for uploading CORA on github! I am trying to use your package in my project, and wanted to make sure if it was the intention of the authors to add the `language_id` twice to the outputs of the `mDPR` before passing it to `mGEN`.  

In [mDPR/dense_retriever.py](https://github.com/AkariAsai/CORA/blob/main/mDPR/dense_retriever.py), in the method `parse_qa_jsonlines_file`, the 2 - letter language id is added to the question while encoding the question for `mDPR`. Considering this was intentional, the 2 letter language id is added again while converting mDPR outputs to seq2seq ([here](https://github.com/AkariAsai/CORA/blob/793db54b9c0dc67e240e5f348c729b03dd4f58be/mGEN/convert_dpr_retrieval_results_to_seq2seq.py#L148))

What ends up happening is that before the input sequence is sent into `mGEN`, the question ID is appended in the end by the language id **twice**, both being the same. We would follow the same format if the authors intended it to be so. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding the language id twice to the question before passing it to mGEN #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Adding the language id twice to the question before passing it to mGEN #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions