Skip to content
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
This repository was archived by the owner on Oct 31, 2023. It is now read-only.

Training DPR with different language  #240

Description

@shahad2099

Hello everyone
I want to train the DPR in a different language [Arabic]
I've been trying for almost two weeks but I feel like I'm lost :(
I want to know what are the steps and what should I change to train.
I know that I should change the bi-encoder into an Arabic model [Arabic version of Bert]
and also change the format of the datasets into DPR format.
other than that I feel lost what should I do next? should I do the next steps :
1- run generate_dense_embeddings.py with Arabic wiki corpus
2- then train train_dense_encoder.py with the Arabic Language model and Arabic dataset.
3-and finally, evaluate using dense_retriever.py.
Is that what I have to do?

Thank you

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions