[Pixman] [PATCH] MIPS: DSPr2: Added over_n_8_8888 and over_n_8_0565 fast paths.

Matt Turner mattst88 at gmail.com
Tue Apr 3 11:16:20 PDT 2012


On Tue, Apr 3, 2012 at 1:30 PM, Nemanja Lukic <nlukic at mips.com> wrote:
> From: Nemanja Lukic <nemanja.lukic at rt-rk.com>
>
> Performance numbers before/after on MIPS-74kc @ 1GHz
>
> Referent (before):
>
> lowlevel-blt-bench:
>     over_n_8_8888 =  L1:  10.71  L2:  10.11  M:  8.70 ( 34.57%)  HT:  7.82  VT:  7.77  R:  7.66  RT:  5.37 (  41Kops/s)
>     over_n_8_0565 =  L1:   8.24  L2:   8.04  M:  7.49 ( 19.84%)  HT:  6.82  VT:  6.75  R:  6.70  RT:  4.85 (  40Kops/s)
> cairo-perf-trace:
> [ # ]  backend                         test   min(s) median(s) stddev. count
> [ # ]    image: pixman 0.25.3
> [  0]    image           swfdec-giant-steps   76.936 77.822   0.49%    6/6
> [  1]    image         gnome-system-monitor  277.838  278.500   0.16%    6/6
> [ # ]  image16: pixman 0.25.3
> [  0]    image16         swfdec-giant-steps   60.598 61.966   1.10%    6/6
> [  1]    image16       gnome-system-monitor  277.628  277.675   0.02%    6/6
>
> Optimized:
>
> lowlevel-blt-bench:
>     over_n_8_8888 =  L1:  18.38  L2:  17.29  M: 13.49 ( 53.58%)  HT: 11.44  VT: 11.31  R: 11.05  RT:  6.65 (  47Kops/s)
>     over_n_8_0565 =  L1:  12.42  L2:  11.86  M: 10.68 ( 28.28%)  HT:  9.27  VT:  9.16  R:  9.04  RT:  5.83 (  44Kops/s)
> cairo-perf-trace:
> [ # ]  backend                         test   min(s) median(s) stddev. count
> [ # ]    image: pixman 0.25.3
> [  0]    image           swfdec-giant-steps   71.430   71.593   0.18%    6/6
> [  1]    image         gnome-system-monitor  253.903  254.007   0.02%    6/6
> [ # ]  image16: pixman 0.25.3
> [  0]  image16           swfdec-giant-steps   58.791   59.358   0.62%    6/6
> [  1]  image16         gnome-system-monitor  253.713  253.863   0.03%    6/6
> ---

The performance increase seems a lot smaller than I'd have expected
(my Lemote optimized code is 40% faster in over_n_8_0565 and 60%
faster over_n_8_8888). By skipping pixels when src == 0xff, mask ==
0xff, or mask == 0x00 you should gain quite a bit of performance. It
looks like your code already does this though, so I'm not sure.


More information about the Pixman mailing list