[poppler] Could not parse charref for nameToUnicode errors

Wed Dec 19 04:06:34 PST 2007

Albert Astals Cid wrote:
> Hi Ed, i'm getting lots of "Could not parse charref for nameToUnicode" after 
> applying your latest patch for Adobe Glyph Naming convention in 
> http://home.zcu.cz/~jklement/spolehlivost.pdf
> 
> Is it normal?

It would probably be better to remove the warnings if a glyph name can
not be parsed. The above pdf file has a toUnicode map so the glyph names
are not required for text extraction.

> BTW, i'm getting the same output than without your patch, but so many 
> warnings "scare" me.

As the pdf has a toUnicode map the glyph names are not used for
copy/paste of text so there will be no difference in output.

I've created a test file to test the patch

http://annarchy.freedesktop.org/~ajohnson/test.pdf

The numbers "1", "2", and "3", are mapped to the text "test", "text",
and "the". The "Z" has the glyph name "g1" so it should be ignored when
extracting text.

I have found a bug in the code. With the test file I get

 $ pdftotext test.pdf -
 Error: Could not parse charref for nameToUnicode: g1
 This is = test of text extr=?tion using the glyph n=mes

The output should be:
 This is a test of text extraction using the glyph names

It looks like the glyph names "u00061" and "u0063" are not decoded
correctly.