[poppler] Multicolumn select

Baz brian.ewins at gmail.com
Mon Dec 7 19:00:57 PST 2009


2009/12/7 Albert Astals Cid <aacid at kde.org>:
> A Dilluns 23 Novembre 2009 09:37:42, vàreu escriure:
>> 2009/11/18 Albert Astals Cid <aacid at kde.org>:
>> > A Dilluns, 16 de novembre de 2009, Baz va escriure:
>> >> I've checked now... yes pdftotext with no flags will hit the new
>> >> reading order code.
>> >
>> > And that is good or bad? :D
>>
>> It turns out, good.
>>
>> These are the results  of comparing the sizes of diffs to acrobat
>> output  for poppler before and after the patch. The diff is just done
>> on word order, to try to pick up paragraphs that have been misplaced.
>> The filenames refer to the bugzillas where I found these: freedesktop,
>> gnome, ubuntu launchpad, and kde.
>
> BTW can you have a look at http://bugs.freedesktop.org/show_bug.cgi?id=25482 ?
>
> Thanks,
>  Albert
>

Its pushing letters semi-randomly into two separate line objects. A
plausible explanation would be if the algorithm scans from the right
edge of one glyph to the left of the next when adding glyphs to words
(or words to lines). That would skip overlapped glyphs/words,
producing everything I'm seeing here, and the code does do something
like that.

Assuming thats right, theres an obvious fix (always scan left edge to
left edge), and its not related to the other selection bugs. I'll
comment on bugzilla once I've tracked this down in the coalesce()
code.

-Baz


More information about the poppler mailing list