[poppler] Poppler 0.15.2 (0.16 Beta 2) released
Albert Astals Cid
aacid at kde.org
Sat Nov 20 04:58:40 PST 2010
A Dissabte, 20 de novembre de 2010, Albert Astals Cid va escriure:
> Hi, (please do not mail me directly unless you really have to)
>
> A Dissabte, 20 de novembre de 2010, Brian Ewins va escriure:
> > On 20 Nov 2010, at 00:22, Albert Astals Cid <aacid at kde.org> wrote:
> > > A Dilluns, 15 de novembre de 2010, Baz va escriure:
> > >> On 14 November 2010 16:23, Albert Astals Cid <aacid at kde.org> wrote:
> > >>> BTW here comes the updated release schedule
> > >>>
> > >>> * Nov 29 (+2 weeks) Poppler 0.15.3 (0.16 RC)
> > >>
> > >> Can you consider the performance bugfix on bug 3188 for this release.
> > >> https://bugs.freedesktop.org/show_bug.cgi?id=3188#c65
> > >>
> > >> Marek commented on the bug that he's working on further changes, but
> > >> that's to fix a different section of slow code triggered by the same
> > >> test document that Dennis Sheil mentioned on the list
> > >> (http://www.ratp.info/picts/touristes/photos/plan%20paris-touriste.pdf
> > >> ).
> > >
> > > A fix that changes pdftotext output, asked the author if that is to be
> > > expected or not (i'd expect only more speed, not a different output).
> >
> > The heuristic has changed slightly so yes there are circumstances where
> > you could get different output; I've not seen an example
>
> I can send you the pdf file if you want.
>
> > (though I guess it is
> > likely to happen somewhere in that scattershot bus map). 2 things have
> > changed, the initial sort order and the heuristic for deciding which
> > blocks to visit first. I think only the first of these changes the
> > results; a long explanation follows.
> >
> > The previous heuristic said block A must be visited before block B if it
> > is entirely to the left of B and there is no block C that is above A,
> > below B, and overlaps both horizontally. The new heuristic avoids an
> > explicit search for block C by tracking an interval that starts off as
> > the horizontal bounds of B and widens to cover any blocks that it
> > overlaps as we move down the page. In the 3 block case, this is the
> > same, but it differs when you have 5 or more blocks: If there is a block
> > D that overlaps neither A nor B, and blocks E, F such that E overlaps A
> > and D, and F overlaps D and B, then A will be marked to visit before B.
> > However, this case would have happened before by induction anyway-D
> > would have been visited before B, and A before D under the old rule. So
> > I don't think this change did anything other than improve speed. It
> > relies on the blocks having being pre-sorted vertically though, which is
> > the other change.
> >
> > If the heuristic does not have any way to decide which of two blocks
> > should be visited first, previously it would visit the first one in
> > physical order. Now it visits the one closest to the top; and leftmost
> > if at the same height (or top right for RTL, etc). This tends to be the
> > same but is not always; the bus map labels were in a random order
> > physically, for example. However for normal text, top left is a decent
> > guess. Where it can go wrong is eg if you have a 2 column doc with
> > leading vertical space in the left col, and the left column ends up
> > overlapping the right (due to some non-rectangular layout). In this case
> > there the heuristic will not spot that the left column should have been
> > first.
>
> Well, reading this long description i don't see this as an optimization but
> as a [small] behaviour change. We are past the feature freeze so i'm a bit
> hesitant to let this in, anyway what i'll do is this:
> * Run the test suite and see on how many files pdftotext gives a different
> result with the patch and without
> * If the number of files is relatively small, see if the differences are
> improvements or not and if they are not if we can "live" with the changes.
Found another one when the difference is not acceptable. It moves the "2)"
from the correct position to the wrong one.
I'm attaching the file in case you want to have a look and see if you can fix
the regression.
Albert
>
> > As I've said in the past, we'll get better results than this if we take
> > the reading order from tagged PDF. Otherwise it is just guesswork.
>
> Patches welcome ;-)
>
> Albert
>
> > > Albert
> > >
> > >> Thanks,
> > >> Brian
> > >>
> > >>> * Dec 27 (+4 weeks) Poppler 0.16.0
> > >>>
> > >>> We are in bugfixing mode in trunk until we release Poppler 0.16.0
> > >>>
> > >>> Albert
> > >>> _______________________________________________
> > >>> poppler mailing list
> > >>> poppler at lists.freedesktop.org
> > >>> http://lists.freedesktop.org/mailman/listinfo/poppler
> > >>
> > >> _______________________________________________
> > >> poppler mailing list
> > >> poppler at lists.freedesktop.org
> > >> http://lists.freedesktop.org/mailman/listinfo/poppler
> > >
> > > _______________________________________________
> > > poppler mailing list
> > > poppler at lists.freedesktop.org
> > > http://lists.freedesktop.org/mailman/listinfo/poppler
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/poppler
-------------- next part --------------
A non-text attachment was scrubbed...
Name: alumnes_normativa.pdf
Type: application/pdf
Size: 31210 bytes
Desc: not available
URL: <http://lists.freedesktop.org/archives/poppler/attachments/20101120/5431f9ee/attachment-0001.pdf>
More information about the poppler
mailing list