[Pixman] [PATCH] ARM: make use of UQADD8 instruction even in generic C code paths

Siarhei Siamashka siarhei.siamashka at gmail.com
Thu Dec 6 10:17:50 PST 2012


On Thu,  6 Dec 2012 19:45:44 +0200
Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:

> ARMv6 has UQADD8 instruction, which implements unsigned saturated
> addition for 8-bit values packed in 32-bit registers. It is very useful
> for UN8x4_ADD_UN8x4, UN8_rb_ADD_UN8_rb and ADD_UN8 macros (which would
> otherwise need a lot of arithmetic operations to simulate this operation).
> Since most of the major ARM linux distros are built for ARMv7, we are
> much less dependent on runtime CPU detection and can get practical
> benefits from conditional compilation here for a lot of users.
> 
> The results of cairo-perf-trace benchmark on ARM Cortex-A15 with pixman
> compiled by gcc 4.7.2 and PIXMAN_DISABLE set to "arm-simd arm-neon":
> 
> Speedups
> ========
> image    firefox-talos-gfx  (29938.22 0.12%) ->  (27814.76 0.51%) : 1.08x speedup
> image    firefox-asteroids  (23241.11 0.07%) ->  (21795.19 0.07%) : 1.07x speedup
> image firefox-canvas-alpha (174519.85 0.08%) -> (164788.64 0.20%) : 1.06x speedup
> image              poppler   (9464.46 1.61%) ->   (8991.53 0.14%) : 1.05x speedup
> ---
>  pixman/pixman-combine32.h |   47 +++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 47 insertions(+), 0 deletions(-)

Forgot to mention, the benchmark numbers above assume that the patch for faster
combine_over_u has been already applied to pixman:
    http://lists.freedesktop.org/archives/pixman/2012-November/002384.html

If we apply only this UQADD8 patch alone and compare the performance
with the current pixman git, we get:

Speedups
========
image    firefox-paintball (622686.54 0.03%) -> (566993.97 0.10%) : 1.10x speedup
image        chromium-tabs    (737.67 0.12%) ->    (682.36 0.27%) : 1.08x speedup
image     firefox-fishtank (513843.85 0.06%) -> (479705.23 0.12%) : 1.07x speedup
image    firefox-talos-gfx  (29954.45 0.18%) ->  (28382.82 0.55%) : 1.07x speedup
image    firefox-asteroids  (24591.65 0.14%) ->  (23239.72 0.10%) : 1.06x speedup
image firefox-canvas-alpha (190829.98 0.08%) -> (180617.98 0.05%) : 1.06x speedup
image              poppler   (9484.97 0.06%) ->   (8998.34 0.06%) : 1.06x speedup
image     firefox-fishbowl (421040.07 0.06%) -> (400184.18 0.15%) : 1.05x speedup
image       firefox-canvas  (90428.10 0.06%) ->  (86074.26 0.11%) : 1.05x speedup


If we apply both UQADD8 and combine_over_u patches and compare the
performance with the current pixman git:

Speedups
========
image    firefox-paintball (622686.54 0.03%) -> (426471.59 0.03%) : 1.46x speedup
image     firefox-fishtank (513843.85 0.06%) -> (375270.57 0.12%) : 1.37x speedup
image       firefox-canvas  (90428.10 0.06%) ->  (67424.41 0.02%) : 1.34x speedup
image     firefox-fishbowl (421040.07 0.06%) -> (356533.95 0.11%) : 1.18x speedup
image    firefox-talos-svg (127530.61 0.02%) -> (108182.31 0.10%) : 1.18x speedup
image firefox-canvas-alpha (190829.98 0.08%) -> (164788.64 0.20%) : 1.16x speedup
image    firefox-asteroids  (24591.65 0.14%) ->  (21795.19 0.07%) : 1.13x speedup
image    firefox-particles (214047.38 0.10%) -> (194802.61 0.03%) : 1.10x speedup
image       swfdec-youtube   (8437.75 2.06%) ->   (7692.13 0.63%) : 1.10x speedup
image        chromium-tabs    (737.67 0.12%) ->    (681.24 0.24%) : 1.08x speedup
image    firefox-talos-gfx  (29954.45 0.18%) ->  (27814.76 0.51%) : 1.08x speedup
image   firefox-chalkboard (156512.01 0.08%) -> (147481.59 0.12%) : 1.06x speedup
image              poppler   (9484.97 0.06%) ->   (8991.53 0.14%) : 1.06x speedup


UQADD8 patch helps for the translucent cases. And combine_over_u patch
helps for the transparent and opaque cases (alpha is 0x00 or 0xFF).
They both work quite nice together.

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list