[Pixman] [PATCH 2/2] ssse3: Add iterator for separable bilinear scaling

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Sep 5 18:00:59 PDT 2013

On Thu, 29 Aug 2013 13:02:53 -0400
"Søren Sandmann Pedersen" <sandmann at cs.au.dk> wrote:

> This new iterator uses the SSSE3 instructions pmaddubsw and pabsw to
> implement a fast iterator for bilinear scaling.

This patch shows some really good performance for upscaling. In fact
even better than I expected. And the trick with PABSW is very nice.

> There is a graph here recording the per-pixel time for various
> bilinear scaling algorithms as reported by scaling-bench:
>     http://people.freedesktop.org/~sandmann/ssse3/ssse3.png

I wonder if the discontinuity of the lines on the graph are caused by
the calloc behaviour:


On my system, doing explicit memset or solid fill to the allocated
memory (before starting the timer) resulted in generally lower measured
times and less chaotic graphs. Also running the tests multiple times
for each scaling ratio and selecting the best result seems to filter
out even more measurements noise.

> As the graph shows, this new iterator is clearly faster than the
> existing C iterator, and when used with an SSE2 combiner, it is also
> faster than the existing SSE2 fast paths except for the lowest scaling
> ratios.
> The data was measured on an Ivy Bridge i7-3520M @ 2.0GHz and is
> available in this directory:
>     http://people.freedesktop.org/~sandmann/ssse3/

Just spotted one problem with the patch. Compilation in 32-bit mode
fails with "undefined reference to `_mm_cvtsi128_si64'". Looks like
_mm_storel_epi64 needs to be used instead of _mm_cvtsi128_si64.

Best regards,
Siarhei Siamashka

More information about the Pixman mailing list