[poppler] Differing number of items returned from get_text{, layout} for glyphs over page edge

Carlos Garcia Campos carlosgc at gnome.org
Sat Nov 2 15:58:34 CET 2013


Peter Waller <peter at scraperwiki.com> writes:

> On 2 November 2013 13:22, Carlos Garcia Campos <carlosgc at gnome.org> wrote:
>
>> I don't think we should return characters that are not inside the
>> page. What is your use case exactly?
>>
>> In evince we use the layout information to implement caret navigation,
>> for example, it doesn't make sense to move the caret outside the
>> page. In the case of selections, you can pass a bigger selection
>> rectangle to get the text off the page.
>>
>
> Unfortunate that we're having this conversation in two places. What's the
> etiquette here?

I replied to the bug before reading the mailing list. We can discuss it
here. 

> We're using the layout information of glyphs in (frequently poorly
> formatted) PDFs to try and inform extracting data from it. So I'm
> simultaneously after the glyphs and where they are. We're currently using
> the glib interface. I'd be happy to ask for glyphs in a (-inf, -inf, +inf,
> +inf) poppler::Rectangle except that I don't see a way to simultaneously
> get layout and font information.

Maybe we could add poppler_page_get_text_layout_for_rectangle and
poppler_page_get_text_attributes_for_rectangle so that you can pass a
rectangle bigger than the page bbox (and smaller of course).

> Thanks,
>
> - Peter

-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20131102/ed843672/attachment.pgp>


More information about the poppler mailing list