Using existing NLP pre-trained encoders like BERT, RoBERTa

What do you think about combining your architecture with existing pre-trained encoders? Can BERT as an **prior_encoder**  help achieve the better results?