[poppler] Poppler 0.15.2 (0.16 Beta 2) released

Albert Astals Cid aacid at kde.org
Sat Nov 20 04:50:11 PST 2010


Hi, (please do not mail me directly unless you really have to)

A Dissabte, 20 de novembre de 2010, Brian Ewins va escriure:
> On 20 Nov 2010, at 00:22, Albert Astals Cid <aacid at kde.org> wrote:
> > A Dilluns, 15 de novembre de 2010, Baz va escriure:
> >> On 14 November 2010 16:23, Albert Astals Cid <aacid at kde.org> wrote:
> >>> BTW here comes the updated release schedule
> >>> 
> >>> * Nov 29 (+2 weeks) Poppler 0.15.3 (0.16 RC)
> >> 
> >> Can you consider the performance bugfix on bug 3188 for this release.
> >> https://bugs.freedesktop.org/show_bug.cgi?id=3188#c65
> >> 
> >> Marek commented on the bug that he's working on further changes, but
> >> that's to fix a different section of slow code triggered by the same
> >> test document that Dennis Sheil mentioned on the list
> >> (http://www.ratp.info/picts/touristes/photos/plan%20paris-touriste.pdf).
> > 
> > A fix that changes pdftotext output, asked the author if that is to be
> > expected or not (i'd expect only more speed, not a different output).
> 
> The heuristic has changed slightly so yes there are circumstances where you
> could get different output; I've not seen an example 

I can send you the pdf file if you want.

> (though I guess it is
> likely to happen somewhere in that scattershot bus map). 2 things have
> changed, the initial sort order and the heuristic for deciding which
> blocks to visit first. I think only the first of these changes the
> results; a long explanation follows.
> 
> The previous heuristic said block A must be visited before block B if it is
> entirely to the left of B and there is no block C that is above A, below
> B, and overlaps both horizontally. The new heuristic avoids an explicit
> search for block C by tracking an interval that starts off as the
> horizontal bounds of B and widens to cover any blocks that it overlaps as
> we move down the page. In the 3 block case, this is the same, but it
> differs when you have 5 or more blocks: If there is a block D that
> overlaps neither A nor B, and blocks E, F such that E overlaps A and D,
> and F overlaps D and B, then A will be marked to visit before B. However,
> this case would have happened before by induction anyway-D would have been
> visited before B, and A before D under the old rule. So I don't think this
> change did anything other than improve speed. It relies on the blocks
> having being pre-sorted vertically though, which is the other change.
> 
> If the heuristic does not have any way to decide which of two blocks should
> be visited first, previously it would visit the first one in physical
> order. Now it visits the one closest to the top; and leftmost if at the
> same height (or top right for RTL, etc). This tends to be the same but is
> not always; the bus map labels were in a random order physically, for
> example. However for normal text, top left is a decent guess. Where it can
> go wrong is eg if you have a 2 column doc with leading vertical space in
> the left col, and the left column ends up overlapping the right (due to
> some non-rectangular layout). In this case there the heuristic will not
> spot that the left column should have been first.

Well, reading this long description i don't see this as an optimization but as 
a [small] behaviour change. We are past the feature freeze so i'm a bit 
hesitant to let this in, anyway what i'll do is this:
 * Run the test suite and see on how many files pdftotext gives a different 
result with the patch and without
 * If the number of files is relatively small, see if the differences are 
improvements or not and if they are not if we can "live" with the changes.

> 
> As I've said in the past, we'll get better results than this if we take the
> reading order from tagged PDF. Otherwise it is just guesswork.

Patches welcome ;-)

Albert

> 
> > Albert
> > 
> >> Thanks,
> >> Brian
> >> 
> >>> * Dec 27 (+4 weeks) Poppler 0.16.0
> >>> 
> >>> We are in bugfixing mode in trunk until we release Poppler 0.16.0
> >>> 
> >>> Albert
> >>> _______________________________________________
> >>> poppler mailing list
> >>> poppler at lists.freedesktop.org
> >>> http://lists.freedesktop.org/mailman/listinfo/poppler
> >> 
> >> _______________________________________________
> >> poppler mailing list
> >> poppler at lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/poppler
> > 
> > _______________________________________________
> > poppler mailing list
> > poppler at lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/poppler


More information about the poppler mailing list