[poppler] Differing number of items returned from get_text{, layout} for glyphs over page edge

Carlos Garcia Campos carlosgc at gnome.org
Thu Dec 5 10:22:45 PST 2013


Germán Póo-Caamaño <gpoo at gnome.org> writes:

> On Wed, 2013-12-04 at 10:53 -0300, Germán Póo-Caamaño wrote:
>> On Wed, 2013-12-04 at 13:34 +0000, Peter Waller wrote:
>> > On 28 November 2013 07:22, Germán Póo-Caamaño <gpoo at gnome.org> wrote:
>> >         FWIW, I tried to use poppler_page_get_text_layout_for_area()
>> >          for
>> >         implementing the text annotation markup in glib (in the demo),
>> >         but it is
>> >         notoriously slower than the deprecated
>> >         poppler_page_get_selection_region().
>> >
>> > What are your timings? I tested it and see that
>> > poppler_page_get_text_layout_for_area and
>> > poppler_page_get_text_for_area are ~30% slower for emptyish pages and
>> > <~5% slower for more populated pages. Absolutely it is O(1ms) so
>> > doesn't seem a concern to me, but I'm curious what problem you're
>> > encountering.
>> 
>> For highlighting text interactively (or any other text markup), it calls
>> poppler_page_get_text_layout_for_area every time the selection region
>> changes. That is, to get the rectangles and set/update the annotation.
>> 
>> So, the delay is multiplied for the number of updates required.
>
> To make it clear, see the following screencast:
>
> http://calcifer.org/tmp/evince/poppler-get-rectangles-comparison.ogv
>
> On the left, the one that uses poppler_page_get_selection_region() to
> get the rectangles in the selection region, and to the right the one
> that uses poppler_page_get_text_layout_for_area().
>
> See how the highlight follows the cursor on the left one.  In the other
> one, the interaction with the mouse finished (the cursor changes once
> the button is released) and after some seconds, the computation ends and
> the text markup is updated.

The problem is not that get_text_layout is slower then
get_selection_region (which could be, but not that much). The actual
problem is the way you are using both APIs. poppler_page_get_text_layout
returns a rectangle for every character in the text, so you are adding a
quadrilateral to the annotation for every character selected while
poppler_page_get_selection_region returns a list of lines blocks. So,
what takes a lot more time is rendering the page itself, not getting the
region/layout. You should iterate the characters until the end of the
selection or until a new line is found, you can detect new lines because
the rectangle is empty x1 == x2 && y1 == y2.

> -- 
> Germán Poo-Caamaño
> http://calcifer.org/

-- 
Carlos Garcia Campos
PGP key: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x523E6462
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20131205/23ea8868/attachment-0001.pgp>


More information about the poppler mailing list