Skip to content

Diacritics not working #1

@oorzaak

Description

@oorzaak

Hi,

I tried the pdf2text class. It seems to work fine except for processing diacritics like é or ë, which are used frequently in the files that I need to "read".

Maybe there is a way to solve this by adding another character set which can handle these characters? Unfortunately I do not know how to add this myself.

I did try to get the type of decoding of my pdfs as detected by the pdf2text class in the function getDecodedStream(). I edited the code in order to display the $key variable. This is reported to be FlateDecode. When viewing the pdfs in a code editor I also see /Filter/FlateDecode in the pdf header. Yet I do not know how to go on from here; I hope that you do?

Kind regards, Frits

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions