[poppler] Could not parse charref for nameToUnicode errors

Jonathan Kew jonathan_kew at sil.org
Wed Dec 19 06:11:07 PST 2007


On 19 Dec 2007, at 12:06 pm, Adrian Johnson wrote:

> I've created a test file to test the patch
>
> http://annarchy.freedesktop.org/~ajohnson/test.pdf
>
> The numbers "1", "2", and "3", are mapped to the text "test", "text",
> and "the". The "Z" has the glyph name "g1" so it should be ignored  
> when
> extracting text.
>
> I have found a bug in the code. With the test file I get
>
>  $ pdftotext test.pdf -
>  Error: Could not parse charref for nameToUnicode: g1
>  This is = test of text extr=?tion using the glyph n=mes
>
> The output should be:
>  This is a test of text extraction using the glyph names
>
> It looks like the glyph names "u00061" and "u0063" are not decoded
> correctly.

To be more specific, it looks as though the names are being  
interpreted as decimal rather than hexadecimal.

Could it be that some implementations of sscanf require an 0x prefix  
to scan hex, and otherwise treat the value as decimal?

JK



More information about the poppler mailing list