-
Notifications
You must be signed in to change notification settings - Fork 537
Description
Grobid version
grobid/grobid:0.8.2-full - Docker - Deep Learning Model
Operating System and architecture (arm64, amd64, x86, etc.)
No response
What is your Java version
No response
Log and information
No response
Further information
Hello team,
We’ve noticed that GROBID is not parsing email addresses from some PDFs, even though the emails are clearly visible in the document. I’ve attached one such example where the email is right on the first page, but it isn’t captured in the extracted XML.
Additionally, for one of the PDFs, the author’s name was parsed as Mandarin characters, even though it appears in English in the original file. The corresponding PDF and the generated XML are both attached for reference.
Please let me know if you need any additional details or sample files to investigate this further.
Thanks!
The one below was parsed with the Authors in Mandarin Text
parsed_with_chinese.xml
English_PDF.pdf