[poppler] Differing number of items returned from get_text{, layout} for glyphs over page edge

Peter Waller peter at scraperwiki.com
Sat Nov 2 14:31:09 CET 2013


On 2 November 2013 13:22, Carlos Garcia Campos <carlosgc at gnome.org> wrote:

> I don't think we should return characters that are not inside the
> page. What is your use case exactly?
>
> In evince we use the layout information to implement caret navigation,
> for example, it doesn't make sense to move the caret outside the
> page. In the case of selections, you can pass a bigger selection
> rectangle to get the text off the page.
>

Unfortunate that we're having this conversation in two places. What's the
etiquette here?

We're using the layout information of glyphs in (frequently poorly
formatted) PDFs to try and inform extracting data from it. So I'm
simultaneously after the glyphs and where they are. We're currently using
the glib interface. I'd be happy to ask for glyphs in a (-inf, -inf, +inf,
+inf) poppler::Rectangle except that I don't see a way to simultaneously
get layout and font information.

Thanks,

- Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20131102/cf5b3df9/attachment.html>


More information about the poppler mailing list