[poppler] [PATCH 4/4] SplashXPathScanner: Reduce complexity of sorting spans

Stefan Brüns stefan.bruens at rwth-aachen.de
Sat May 26 15:48:54 UTC 2018


On Samstag, 26. Mai 2018 17:11:41 CEST Adam Reichold wrote:
> Hello again,
> 
> Am 26.05.2018 um 16:22 schrieb Stefan Brüns:
> > Unfortunately, it gives only a 100% speed increase (i.e. 30 minutes with
> > small_vector, 60 minutes with std::vector), so probably I should stop
> > here.
> 
> I am unsure about the level of irony involved, but I also do not fully
> understand the numbers: In your previous mail you wrote

30 minutes with std::vector<small_vector<...>>, 60 minutes with 
std::vector<std::vector<...>>. I was only referring to the option of not using 
boost here.

> > Before:
> > - User time (seconds): 2979.80
> > - Maximum resident set size (kbytes): 51208
> > 
> > After:
> > - User time (seconds): 1773.53
> > - Maximum resident set size (kbytes): 45896
> 
> i.e. ca. 60 minutes without your changes (algorithm + allocations) and
> ca. 30 minutes with them. Now you write that changing only algorithm but
> not changing the allocation brings you back to 60 minutes? Does this
> mean that basically all of the speed-up is due to the use of small_vec
> instead vector?
> 
> If I misunderstood that and you meant the 100% w.r.t. changing algorithm
> + allocations, then producing a measurement where just vector instead of
> small_vec is used would probably be interesting. And it would be another
> reason to break out that part of the patch.

Assuming just drawing one simple convex polygon with height 1200 scanlines 
(300px with 4xAA):

1) Old with Intersection[]:
Uses one large vector/array for all scanline intersections of a polygon. There 
is on scanline for each output pixel, or (typically) 4 when using 
Antialiasing. I.e. for the simple polygon, you have 2400 intersections. The 
size of the array grows by 2 each time (starting with 32), so the array is 
realloced 7 times until it has space for 4096 elements.
After filling the array, it is sorted in y/x order, i.e. O(N log N), N = 2400.

2) New with std::vector<std::vector<Intersection>>
The outer vector is set to size=1200 (number of scanlines is known a-priori). 
Each time an intersection is added, the inner vector grows, eventually forcing 
a reallocation. 1201 allocations are needed.
After filling the array, each row is sorted, i.e. O(M * N log N), M = 1200, N 
= 2.

3) New with std::vector<small_vector<4, Intersection>>
As above, but the allocation for the outer vector (size 1200) also allocates 
space for the intersections in *one contigous allocation*. Just one allocation 
is needed.
Sorting is as as above

                                    | allocs | sorting
1) old (array)                      |      7 | O(N log N), N = 2400
2) new (vector<vector<T>>)          |   1201 | O(M * N log N), M = 1200, N = 2
3) new (vector<small_vector<4, T>>) |      1 | O(M * N log N), M = 1200, N = 2

So, (1) is slow because sorting is slow, (2) is slow because it it spends most 
of the time allocating. (3) is faster for allocations and sorting.

1) runsforever-poppler.pdf: 50 minutes
2) runsforever-poppler.pdf: 60 minutes
3) runsforever-poppler.pdf: 30 minutes

When increasing the resolution, (2) and (3) slow down linearly, while (1) 
slows down super-linear.


-- 
Stefan Brüns  /  Bergstraße 21  /  52062 Aachen
home: +49 241 53809034     mobile: +49 151 50412019
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: This is a digitally signed message part.
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180526/f18732ed/attachment.sig>


More information about the poppler mailing list