[Pixman] [PATCH 3/3] Add SSSE3 fast path skeleton

Andrea Canciani ranma42 at gmail.com
Tue Jan 4 01:54:01 PST 2011


On Tue, Jan 4, 2011 at 10:22 AM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> On Wednesday 08 December 2010 16:13:48 Liu Xinyun wrote:
>> There is a performance test between commits
>> commit 56777f3f675869806cd30bcd21a5b39d788507cb
>> Author: Dmitri Vorobiev <dmitri.vorobiev at movial.com>
>> Date:   Wed Sep 22 12:34:57 2010 +0300
>>
>>     Use <sys/mman.h> macros only when they are available
>>
>> commit 3d094997b1820719d15cec7dc633ed37e1912bfc
>> Author: Siarhei Siamashka <siarhei.siamashka at nokia.com>
>> Date:   Tue Nov 30 00:31:06 2010 +0200
>>
>>     Fix for potential unaligned memory accesses
>
> Right, if we look at the changes between these two commits:
>
>> new: ba69989374fe9cbe5151c5aac7b824da0806f94a
>> Speedups
>> ========
>> image-rgba                  ocitysmap-0    7476.29 (7520.52 3.09%) ->
>> 6817.63 (6822.39 3.04%):  1.10x speedup ▏
>> image-rgba                    poppler-0    15028.69 (15029.97 0.71%) ->
>> 13748.83 (13793.77 0.91%):  1.09x speedup ▏
>> image-rgba         gnome-terminal-vim-0    23378.23 (23494.57 0.70%) ->
>> 21916.17 (21926.92 0.52%):  1.07x speedup ▏
>> image-rgba          xfce4-terminal-a1-0    16632.31 (16637.27 0.60%) ->
>> 15630.85 (15650.89 0.63%):  1.06x speedup ▏
>> image-rgba       firefox-planet-gnome-0    87751.74 (87809.43 0.18%) ->
>> 82620.93 (82949.20 0.27%):  1.06x speedup
>>
>> image-rgba          firefox-talos-gfx-0    51169.66 (51542.49 0.36%) ->
>> 48572.45 (48610.13 0.26%):  1.05x speedup
>>
>> image-rgba         swfdec-giant-steps-0    11619.84 (11646.91 0.46%) ->
>> 11056.41 (11057.69 0.40%):  1.05x speedup
>
> Removal of software prefetch provides a good performance improvement. It was a
> nice catch, thanks for that patch. But as a side effect, it also makes SSE2
> optimizations more competitive and somewhat harder to beat by SSSE3.
>
>> new: 1ca715ed1e6914e9bd9f050065e827d7a9e2efc9
>> Slowdowns
>> =========
>> image-rgba          firefox-talos-svg-0    177350.25 (181525.48 1.11%) ->
>> 198170.08 (202124.06 0.96%):  1.12x slowdown ▏
>> image-rgba             swfdec-youtube-0    15843.51 (15850.03 0.76%) ->
>> 18104.51 (18224.94 0.40%):  1.14x slowdown ▏
>
>> new: 1d4f2d71facd5f2bbce74fbe3407ccea6cf4bea1
>> Slowdowns
>> =========
>> image-rgba          firefox-talos-svg-0    197582.42 (202455.51 1.19%) ->
>> 208636.52 (212057.02 0.90%):  1.06x slowdown
>
> And these two slowdowns are caused by the changes in radial gradients code.
> I guess it can't be helped because now radial gradients are supposed to be more
> correct.

I'm quite confident that SIMD makes it possible to speed up the computation of
the interpolation parameter of gradients (both linear and radial).
If someone is interested in improving the performance of radial gradients, I can
point out some transformations which remove basically all of the branching and
make the computation very easy to perform in parallel.

I already implemented them (the idea should be correct but they are
untested) in:
http://cgit.freedesktop.org/~ranma42/cairo/commit/?h=wip/gl2&id=5814ec1c2cec04c4e6cbb70ff2b24d8e157ff321

As a side note, if we want SIMD gradients, we should probably profile
the gradient
walker. I expect quite a lot of time to be spent there, at least for
the linear gradients.

Andrea

PS: I'm guilty of the gradient slowdown, but on the good side it seems to be the
best implementation of radial gradients available, surpassing in
correctness and/or
numerical stability Quartz, GhostScript, poppler, Adobe Reader, Acrobat Pro X...


More information about the Pixman mailing list