[poppler] Incompatible number of glyphs from glib get_text{, layout}
Peter Waller
peter at scraperwiki.com
Wed May 27 09:04:04 PDT 2015
Success!
If I drop strings in doShowText which have
`!font->hasToUnicodeCMap()`, I get the desired output from
poppler_page_get_text() and poppler_page_get_layout().
http://cgit.freedesktop.org/poppler/poppler/tree/poppler/Gfx.cc?id=poppler-0.33.0#n3936
I do that by just returning early from `Gfx::doShowText()`.
Would a patch be welcomed that does this? I propose that OutputDev
would grow a `needUnicodeText()` which would default to false (so that
we don't influence renderers) and TextOutputDev would return true.
This would fix cutting and pasting for approximately 10% of our users
and enable us to get text from documents via the poppler API.
I note that the Adobe Reader running on Windows gave junk when
copy-pasting those characters in my example PDF (but it didn't break
copying the rest of the text).
Thanks,
- Peter
More information about the poppler
mailing list