[poppler] Could not parse charref for nameToUnicode errors
Adrian Johnson
ajohnson at redneon.com
Wed Dec 19 04:06:34 PST 2007
Albert Astals Cid wrote:
> Hi Ed, i'm getting lots of "Could not parse charref for nameToUnicode" after
> applying your latest patch for Adobe Glyph Naming convention in
> http://home.zcu.cz/~jklement/spolehlivost.pdf
>
> Is it normal?
It would probably be better to remove the warnings if a glyph name can
not be parsed. The above pdf file has a toUnicode map so the glyph names
are not required for text extraction.
> BTW, i'm getting the same output than without your patch, but so many
> warnings "scare" me.
As the pdf has a toUnicode map the glyph names are not used for
copy/paste of text so there will be no difference in output.
I've created a test file to test the patch
http://annarchy.freedesktop.org/~ajohnson/test.pdf
The numbers "1", "2", and "3", are mapped to the text "test", "text",
and "the". The "Z" has the glyph name "g1" so it should be ignored when
extracting text.
I have found a bug in the code. With the test file I get
$ pdftotext test.pdf -
Error: Could not parse charref for nameToUnicode: g1
This is = test of text extr=?tion using the glyph n=mes
The output should be:
This is a test of text extraction using the glyph names
It looks like the glyph names "u00061" and "u0063" are not decoded
correctly.
More information about the poppler
mailing list