[poppler] [PATCH 4/4] SplashXPathScanner: Reduce complexity of sorting spans
Adam Reichold
adam.reichold at t-online.de
Sat May 26 17:06:02 UTC 2018
Am 26.05.2018 um 17:48 schrieb Stefan Brüns:
> On Samstag, 26. Mai 2018 17:11:41 CEST Adam Reichold wrote:
>> Hello again,
>>
>> Am 26.05.2018 um 16:22 schrieb Stefan Brüns:
>>> Unfortunately, it gives only a 100% speed increase (i.e. 30 minutes with
>>> small_vector, 60 minutes with std::vector), so probably I should stop
>>> here.
>>
>> I am unsure about the level of irony involved, but I also do not fully
>> understand the numbers: In your previous mail you wrote
>
> 30 minutes with std::vector<small_vector<...>>, 60 minutes with
> std::vector<std::vector<...>>. I was only referring to the option of not using
> boost here.
>
>>> Before:
>>> - User time (seconds): 2979.80
>>> - Maximum resident set size (kbytes): 51208
>>>
>>> After:
>>> - User time (seconds): 1773.53
>>> - Maximum resident set size (kbytes): 45896
>>
>> i.e. ca. 60 minutes without your changes (algorithm + allocations) and
>> ca. 30 minutes with them. Now you write that changing only algorithm but
>> not changing the allocation brings you back to 60 minutes? Does this
>> mean that basically all of the speed-up is due to the use of small_vec
>> instead vector?
>>
>> If I misunderstood that and you meant the 100% w.r.t. changing algorithm
>> + allocations, then producing a measurement where just vector instead of
>> small_vec is used would probably be interesting. And it would be another
>> reason to break out that part of the patch.
>
> Assuming just drawing one simple convex polygon with height 1200 scanlines
> (300px with 4xAA):
>
> 1) Old with Intersection[]:
> Uses one large vector/array for all scanline intersections of a polygon. There
> is on scanline for each output pixel, or (typically) 4 when using
> Antialiasing. I.e. for the simple polygon, you have 2400 intersections. The
> size of the array grows by 2 each time (starting with 32), so the array is
> realloced 7 times until it has space for 4096 elements.
> After filling the array, it is sorted in y/x order, i.e. O(N log N), N = 2400.
So a first measure to reserve the intersection array based on the number
of scanlines and small_vec size, e.g. 4 * scanlines to get this down to
one allocation as well?
> 2) New with std::vector<std::vector<Intersection>>
> The outer vector is set to size=1200 (number of scanlines is known a-priori).
> Each time an intersection is added, the inner vector grows, eventually forcing
> a reallocation. 1201 allocations are needed.
> After filling the array, each row is sorted, i.e. O(M * N log N), M = 1200, N
> = 2.
Couldn't this also be realized using std::vector<Intersection> and a
separate std::vector<std::size_t> which keeps track of how many
intersections are within each scanline to keep everything within a
contiguous buffer? E.g.
auto first = intersections.begin();
for (y for all scanlines) {
const auto last = first + intersections_per_scanline[y];
std::sort(first, first + count, IntersectionCmp());
first = last;
}
I do see that it is not as elegant as the small_vec solution, but if
this vector is reserved as above, it should almost retain the favorable
properties our your proposed solution without the Boost dependency.
Best regards, Adam.
> 3) New with std::vector<small_vector<4, Intersection>>
> As above, but the allocation for the outer vector (size 1200) also allocates
> space for the intersections in *one contigous allocation*. Just one allocation
> is needed.
> Sorting is as as above
>
> | allocs | sorting
> 1) old (array) | 7 | O(N log N), N = 2400
> 2) new (vector<vector<T>>) | 1201 | O(M * N log N), M = 1200, N = 2
> 3) new (vector<small_vector<4, T>>) | 1 | O(M * N log N), M = 1200, N = 2
>
> So, (1) is slow because sorting is slow, (2) is slow because it it spends most
> of the time allocating. (3) is faster for allocations and sorting.
>
> 1) runsforever-poppler.pdf: 50 minutes
> 2) runsforever-poppler.pdf: 60 minutes
> 3) runsforever-poppler.pdf: 30 minutes
>
> When increasing the resolution, (2) and (3) slow down linearly, while (1)
> slows down super-linear.
>
>
>
>
> _______________________________________________
> poppler mailing list
> poppler at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/poppler
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/poppler/attachments/20180526/eb08d13b/attachment-0001.sig>
More information about the poppler
mailing list