Skip to content

Why not using weights of pre-trained BERT? #57

Description

@Lyman-Smoker

Thanks for your great work.

As mentioned in the paper, the multi-modal transformer encoder is randomly initialized.

I am wondering why not just initialize the encoder with the pre-trained weights of BERT? Will it bring performance deterioration?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions