[Pixman] [PATCH 3/3] Add SSSE3 fast path skeleton
Andrea Canciani
ranma42 at gmail.com
Tue Jan 4 01:54:01 PST 2011
On Tue, Jan 4, 2011 at 10:22 AM, Siarhei Siamashka
<siarhei.siamashka at gmail.com> wrote:
> On Wednesday 08 December 2010 16:13:48 Liu Xinyun wrote:
>> There is a performance test between commits
>> commit 56777f3f675869806cd30bcd21a5b39d788507cb
>> Author: Dmitri Vorobiev <dmitri.vorobiev at movial.com>
>> Date: Wed Sep 22 12:34:57 2010 +0300
>>
>> Use <sys/mman.h> macros only when they are available
>>
>> commit 3d094997b1820719d15cec7dc633ed37e1912bfc
>> Author: Siarhei Siamashka <siarhei.siamashka at nokia.com>
>> Date: Tue Nov 30 00:31:06 2010 +0200
>>
>> Fix for potential unaligned memory accesses
>
> Right, if we look at the changes between these two commits:
>
>> new: ba69989374fe9cbe5151c5aac7b824da0806f94a
>> Speedups
>> ========
>> image-rgba ocitysmap-0 7476.29 (7520.52 3.09%) ->
>> 6817.63 (6822.39 3.04%): 1.10x speedup ▏
>> image-rgba poppler-0 15028.69 (15029.97 0.71%) ->
>> 13748.83 (13793.77 0.91%): 1.09x speedup ▏
>> image-rgba gnome-terminal-vim-0 23378.23 (23494.57 0.70%) ->
>> 21916.17 (21926.92 0.52%): 1.07x speedup ▏
>> image-rgba xfce4-terminal-a1-0 16632.31 (16637.27 0.60%) ->
>> 15630.85 (15650.89 0.63%): 1.06x speedup ▏
>> image-rgba firefox-planet-gnome-0 87751.74 (87809.43 0.18%) ->
>> 82620.93 (82949.20 0.27%): 1.06x speedup
>>
>> image-rgba firefox-talos-gfx-0 51169.66 (51542.49 0.36%) ->
>> 48572.45 (48610.13 0.26%): 1.05x speedup
>>
>> image-rgba swfdec-giant-steps-0 11619.84 (11646.91 0.46%) ->
>> 11056.41 (11057.69 0.40%): 1.05x speedup
>
> Removal of software prefetch provides a good performance improvement. It was a
> nice catch, thanks for that patch. But as a side effect, it also makes SSE2
> optimizations more competitive and somewhat harder to beat by SSSE3.
>
>> new: 1ca715ed1e6914e9bd9f050065e827d7a9e2efc9
>> Slowdowns
>> =========
>> image-rgba firefox-talos-svg-0 177350.25 (181525.48 1.11%) ->
>> 198170.08 (202124.06 0.96%): 1.12x slowdown ▏
>> image-rgba swfdec-youtube-0 15843.51 (15850.03 0.76%) ->
>> 18104.51 (18224.94 0.40%): 1.14x slowdown ▏
>
>> new: 1d4f2d71facd5f2bbce74fbe3407ccea6cf4bea1
>> Slowdowns
>> =========
>> image-rgba firefox-talos-svg-0 197582.42 (202455.51 1.19%) ->
>> 208636.52 (212057.02 0.90%): 1.06x slowdown
>
> And these two slowdowns are caused by the changes in radial gradients code.
> I guess it can't be helped because now radial gradients are supposed to be more
> correct.
I'm quite confident that SIMD makes it possible to speed up the computation of
the interpolation parameter of gradients (both linear and radial).
If someone is interested in improving the performance of radial gradients, I can
point out some transformations which remove basically all of the branching and
make the computation very easy to perform in parallel.
I already implemented them (the idea should be correct but they are
untested) in:
http://cgit.freedesktop.org/~ranma42/cairo/commit/?h=wip/gl2&id=5814ec1c2cec04c4e6cbb70ff2b24d8e157ff321
As a side note, if we want SIMD gradients, we should probably profile
the gradient
walker. I expect quite a lot of time to be spent there, at least for
the linear gradients.
Andrea
PS: I'm guilty of the gradient slowdown, but on the good side it seems to be the
best implementation of radial gradients available, surpassing in
correctness and/or
numerical stability Quartz, GhostScript, poppler, Adobe Reader, Acrobat Pro X...
More information about the Pixman
mailing list