[poppler] [PATCH 4/4] SplashXPathScanner: Reduce complexity of sorting spans
Stefan Brüns
stefan.bruens at rwth-aachen.de
Sat May 26 15:48:54 UTC 2018
On Samstag, 26. Mai 2018 17:11:41 CEST Adam Reichold wrote:
> Hello again,
>
> Am 26.05.2018 um 16:22 schrieb Stefan Brüns:
> > Unfortunately, it gives only a 100% speed increase (i.e. 30 minutes with
> > small_vector, 60 minutes with std::vector), so probably I should stop
> > here.
>
> I am unsure about the level of irony involved, but I also do not fully
> understand the numbers: In your previous mail you wrote
30 minutes with std::vector<small_vector<...>>, 60 minutes with
std::vector<std::vector<...>>. I was only referring to the option of not using
boost here.
> > Before:
> > - User time (seconds): 2979.80
> > - Maximum resident set size (kbytes): 51208
> >
> > After:
> > - User time (seconds): 1773.53
> > - Maximum resident set size (kbytes): 45896
>
> i.e. ca. 60 minutes without your changes (algorithm + allocations) and
> ca. 30 minutes with them. Now you write that changing only algorithm but
> not changing the allocation brings you back to 60 minutes? Does this
> mean that basically all of the speed-up is due to the use of small_vec
> instead vector?
>
> If I misunderstood that and you meant the 100% w.r.t. changing algorithm
> + allocations, then producing a measurement where just vector instead of
> small_vec is used would probably be interesting. And it would be another
> reason to break out that part of the patch.
Assuming just drawing one simple convex polygon with height 1200 scanlines
(300px with 4xAA):
1) Old with Intersection[]:
Uses one large vector/array for all scanline intersections of a polygon. There
is on scanline for each output pixel, or (typically) 4 when using
Antialiasing. I.e. for the simple polygon, you have 2400 intersections. The
size of the array grows by 2 each time (starting with 32), so the array is
realloced 7 times until it has space for 4096 elements.
After filling the array, it is sorted in y/x order, i.e. O(N log N), N = 2400.
2) New with std::vector<std::vector<Intersection>>
The outer vector is set to size=1200 (number of scanlines is known a-priori).
Each time an intersection is added, the inner vector grows, eventually forcing
a reallocation. 1201 allocations are needed.
After filling the array, each row is sorted, i.e. O(M * N log N), M = 1200, N
= 2.
3) New with std::vector<small_vector<4, Intersection>>
As above, but the allocation for the outer vector (size 1200) also allocates
space for the intersections in *one contigous allocation*. Just one allocation
is needed.
Sorting is as as above
| allocs | sorting
1) old (array) | 7 | O(N log N), N = 2400
2) new (vector<vector<T>>) | 1201 | O(M * N log N), M = 1200, N = 2
3) new (vector<small_vector<4, T>>) | 1 | O(M * N log N), M = 1200, N = 2
So, (1) is slow because sorting is slow, (2) is slow because it it spends most
of the time allocating. (3) is faster for allocations and sorting.
1) runsforever-poppler.pdf: 50 minutes
2) runsforever-poppler.pdf: 60 minutes
3) runsforever-poppler.pdf: 30 minutes
When increasing the resolution, (2) and (3) slow down linearly, while (1)
slows down super-linear.
--
Stefan Brüns / Bergstraße 21 / 52062 Aachen
home: +49 241 53809034 mobile: +49 151 50412019
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180526/f18732ed/attachment.sig>
More information about the poppler
mailing list