[Pixman] [PATCH] sse2: Add a fast path for add_n_8888
Søren Sandmann
sandmann at cs.au.dk
Wed Jan 2 10:40:58 PST 2013
Chris Wilson <chris at chris-wilson.co.uk> writes:
> This path is being exercised by inplace compositing of trapezoids, for
> instance as used in the firefox-asteroids cairo-trace.
>
> core2 @ 2.66GHz,
>
> reference memcpy speed = 4898.2MB/s (1224.6MP/s for 32bpp fills)
>
> before: add_n_8888 = L1: 4.36 L2: 4.27 M: 1.61 ( 0.13%) HT:
> 1.65 VT: 1.63 R: 1.63 RT: 1.59 ( 21Kops/s)
>
> after: add_n_8888 = L1:2969.09 L2:3926.11 M:603.30 ( 49.27%) HT:524.69
> VT:401.01 R:407.59 RT:210.34 ( 804Kops/s)
Just two brief comments, and then I'll disappear again (until the 11th
or so):
- It looks like this function will work for abgr destinations as well as
argb.
- I'm surprised that the new function is _that_ much better. The current
code should hit an SSE2 combiner and noop iterators for both source
and destination, so while I'd expect a solid improvement from a
dedicated fast path, it is hard to believe that it would be 919 times
faster than the old. If these numbers are real, there has to be
something wrong with either the benchmark or the current code.
Soren
More information about the Pixman
mailing list