when pdf include emoji,such as:😄 unicode=0x1f604 when use ms word to generate pdf <img width="551" alt="screenshot_5883" src="https://user-images.githubusercontent.com/1077830/235961069-8a970583-820a-4c40-a096-edb9dca60e85.png"> the unicode include space https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fpdfapi/font/cpdf_tounicodemap.cpp <img width="1037" alt="screenshot_5884" src="https://user-images.githubusercontent.com/1077830/235961878-ab445bd1-2beb-4e51-8a90-f87c0fefdcaf.png"> when space is break,so get unicode=0xd83d but right is =[d8,3d,de,04], then [d8,3d,de,04].decode('utf-16-be') => '😄'
when pdf include emoji,such as:😄
unicode=0x1f604
when use ms word to generate pdf

the unicode include space
https://pdfium.googlesource.com/pdfium/+/refs/heads/main/core/fpdfapi/font/cpdf_tounicodemap.cpp

when space is break,so get unicode=0xd83d
but right is =[d8,3d,de,04], then [d8,3d,de,04].decode('utf-16-be') => '😄'