[poppler] [PATCH 4/4] SplashXPathScanner: Reduce complexity of sorting spans

Sat May 26 17:06:02 UTC 2018

Am 26.05.2018 um 17:48 schrieb Stefan Brüns:
> On Samstag, 26. Mai 2018 17:11:41 CEST Adam Reichold wrote:
>> Hello again,
>>
>> Am 26.05.2018 um 16:22 schrieb Stefan Brüns:
>>> Unfortunately, it gives only a 100% speed increase (i.e. 30 minutes with
>>> small_vector, 60 minutes with std::vector), so probably I should stop
>>> here.
>>
>> I am unsure about the level of irony involved, but I also do not fully
>> understand the numbers: In your previous mail you wrote
> 
> 30 minutes with std::vector<small_vector<...>>, 60 minutes with 
> std::vector<std::vector<...>>. I was only referring to the option of not using 
> boost here.
> 
>>> Before:
>>> - User time (seconds): 2979.80
>>> - Maximum resident set size (kbytes): 51208
>>>
>>> After:
>>> - User time (seconds): 1773.53
>>> - Maximum resident set size (kbytes): 45896
>>
>> i.e. ca. 60 minutes without your changes (algorithm + allocations) and
>> ca. 30 minutes with them. Now you write that changing only algorithm but
>> not changing the allocation brings you back to 60 minutes? Does this
>> mean that basically all of the speed-up is due to the use of small_vec
>> instead vector?
>>
>> If I misunderstood that and you meant the 100% w.r.t. changing algorithm
>> + allocations, then producing a measurement where just vector instead of
>> small_vec is used would probably be interesting. And it would be another
>> reason to break out that part of the patch.
> 
> Assuming just drawing one simple convex polygon with height 1200 scanlines 
> (300px with 4xAA):
> 
> 1) Old with Intersection[]:
> Uses one large vector/array for all scanline intersections of a polygon. There 
> is on scanline for each output pixel, or (typically) 4 when using 
> Antialiasing. I.e. for the simple polygon, you have 2400 intersections. The 
> size of the array grows by 2 each time (starting with 32), so the array is 
> realloced 7 times until it has space for 4096 elements.
> After filling the array, it is sorted in y/x order, i.e. O(N log N), N = 2400.

So a first measure to reserve the intersection array based on the number
of scanlines and small_vec size, e.g. 4  * scanlines to get this down to
one allocation as well?

> 2) New with std::vector<std::vector<Intersection>>
> The outer vector is set to size=1200 (number of scanlines is known a-priori). 
> Each time an intersection is added, the inner vector grows, eventually forcing 
> a reallocation. 1201 allocations are needed.
> After filling the array, each row is sorted, i.e. O(M * N log N), M = 1200, N 
> = 2.

Couldn't this also be realized using std::vector<Intersection> and a
separate std::vector<std::size_t> which keeps track of how many
intersections are within each scanline to keep everything within a
contiguous buffer? E.g.

auto first = intersections.begin();
for (y for all scanlines) {
   const auto last = first + intersections_per_scanline[y];
   std::sort(first, first + count, IntersectionCmp());
   first = last;
}

I do see that it is not as elegant as the small_vec solution, but if
this vector is reserved as above, it should almost retain the favorable
properties our your proposed solution without the Boost dependency.

Best regards, Adam.

> 3) New with std::vector<small_vector<4, Intersection>>
> As above, but the allocation for the outer vector (size 1200) also allocates 
> space for the intersections in *one contigous allocation*. Just one allocation 
> is needed.
> Sorting is as as above
> 
>                                     | allocs | sorting
> 1) old (array)                      |      7 | O(N log N), N = 2400
> 2) new (vector<vector<T>>)          |   1201 | O(M * N log N), M = 1200, N = 2
> 3) new (vector<small_vector<4, T>>) |      1 | O(M * N log N), M = 1200, N = 2
> 
> So, (1) is slow because sorting is slow, (2) is slow because it it spends most 
> of the time allocating. (3) is faster for allocations and sorting.
>
> 1) runsforever-poppler.pdf: 50 minutes
> 2) runsforever-poppler.pdf: 60 minutes
> 3) runsforever-poppler.pdf: 30 minutes
> 
> When increasing the resolution, (2) and (3) slow down linearly, while (1) 
> slows down super-linear.
> 
> 
> 
> 
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180526/eb08d13b/attachment-0001.sig>