[Pixman] [PATCH] ARM: make use of UQADD8 instruction even in generic C code paths
Siarhei Siamashka
siarhei.siamashka at gmail.com
Thu Dec 6 10:17:50 PST 2012
On Thu, 6 Dec 2012 19:45:44 +0200
Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> ARMv6 has UQADD8 instruction, which implements unsigned saturated
> addition for 8-bit values packed in 32-bit registers. It is very useful
> for UN8x4_ADD_UN8x4, UN8_rb_ADD_UN8_rb and ADD_UN8 macros (which would
> otherwise need a lot of arithmetic operations to simulate this operation).
> Since most of the major ARM linux distros are built for ARMv7, we are
> much less dependent on runtime CPU detection and can get practical
> benefits from conditional compilation here for a lot of users.
>
> The results of cairo-perf-trace benchmark on ARM Cortex-A15 with pixman
> compiled by gcc 4.7.2 and PIXMAN_DISABLE set to "arm-simd arm-neon":
>
> Speedups
> ========
> image firefox-talos-gfx (29938.22 0.12%) -> (27814.76 0.51%) : 1.08x speedup
> image firefox-asteroids (23241.11 0.07%) -> (21795.19 0.07%) : 1.07x speedup
> image firefox-canvas-alpha (174519.85 0.08%) -> (164788.64 0.20%) : 1.06x speedup
> image poppler (9464.46 1.61%) -> (8991.53 0.14%) : 1.05x speedup
> ---
> pixman/pixman-combine32.h | 47 +++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 47 insertions(+), 0 deletions(-)
Forgot to mention, the benchmark numbers above assume that the patch for faster
combine_over_u has been already applied to pixman:
http://lists.freedesktop.org/archives/pixman/2012-November/002384.html
If we apply only this UQADD8 patch alone and compare the performance
with the current pixman git, we get:
Speedups
========
image firefox-paintball (622686.54 0.03%) -> (566993.97 0.10%) : 1.10x speedup
image chromium-tabs (737.67 0.12%) -> (682.36 0.27%) : 1.08x speedup
image firefox-fishtank (513843.85 0.06%) -> (479705.23 0.12%) : 1.07x speedup
image firefox-talos-gfx (29954.45 0.18%) -> (28382.82 0.55%) : 1.07x speedup
image firefox-asteroids (24591.65 0.14%) -> (23239.72 0.10%) : 1.06x speedup
image firefox-canvas-alpha (190829.98 0.08%) -> (180617.98 0.05%) : 1.06x speedup
image poppler (9484.97 0.06%) -> (8998.34 0.06%) : 1.06x speedup
image firefox-fishbowl (421040.07 0.06%) -> (400184.18 0.15%) : 1.05x speedup
image firefox-canvas (90428.10 0.06%) -> (86074.26 0.11%) : 1.05x speedup
If we apply both UQADD8 and combine_over_u patches and compare the
performance with the current pixman git:
Speedups
========
image firefox-paintball (622686.54 0.03%) -> (426471.59 0.03%) : 1.46x speedup
image firefox-fishtank (513843.85 0.06%) -> (375270.57 0.12%) : 1.37x speedup
image firefox-canvas (90428.10 0.06%) -> (67424.41 0.02%) : 1.34x speedup
image firefox-fishbowl (421040.07 0.06%) -> (356533.95 0.11%) : 1.18x speedup
image firefox-talos-svg (127530.61 0.02%) -> (108182.31 0.10%) : 1.18x speedup
image firefox-canvas-alpha (190829.98 0.08%) -> (164788.64 0.20%) : 1.16x speedup
image firefox-asteroids (24591.65 0.14%) -> (21795.19 0.07%) : 1.13x speedup
image firefox-particles (214047.38 0.10%) -> (194802.61 0.03%) : 1.10x speedup
image swfdec-youtube (8437.75 2.06%) -> (7692.13 0.63%) : 1.10x speedup
image chromium-tabs (737.67 0.12%) -> (681.24 0.24%) : 1.08x speedup
image firefox-talos-gfx (29954.45 0.18%) -> (27814.76 0.51%) : 1.08x speedup
image firefox-chalkboard (156512.01 0.08%) -> (147481.59 0.12%) : 1.06x speedup
image poppler (9484.97 0.06%) -> (8991.53 0.14%) : 1.06x speedup
UQADD8 patch helps for the translucent cases. And combine_over_u patch
helps for the transparent and opaque cases (alpha is 0x00 or 0xFF).
They both work quite nice together.
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list