[poppler] Incompatible number of glyphs from glib get_text{, layout}

Peter Waller peter at scraperwiki.com
Wed May 27 09:04:04 PDT 2015


Success!

If I drop strings in doShowText which have
`!font->hasToUnicodeCMap()`, I get the desired output from
poppler_page_get_text() and poppler_page_get_layout().

http://cgit.freedesktop.org/poppler/poppler/tree/poppler/Gfx.cc?id=poppler-0.33.0#n3936

I do that by just returning early from `Gfx::doShowText()`.

Would a patch be welcomed that does this? I propose that OutputDev
would grow a `needUnicodeText()` which would default to false (so that
we don't influence renderers) and TextOutputDev would return true.

This would fix cutting and pasting for approximately 10% of our users
and enable us to get text from documents via the poppler API.

I note that the Adobe Reader running on Windows gave junk when
copy-pasting those characters in my example PDF (but it didn't break
copying the rest of the text).

Thanks,

- Peter


More information about the poppler mailing list