[Pixman] [PATCH 3/6] Faster conversion from a8r8g8b8 to r5g6b5 in C code

Siarhei Siamashka siarhei.siamashka at gmail.com
Mon Dec 3 14:55:47 PST 2012


This change reduces 3 shifts, 3 ANDs and 2 ORs (total 8 arithmetic
operations) to 3 shifts, 2 ANDs and 2 ORs (total 7 arithmetic
operations).

We get garbage in the high 16 bits of the result, which might need
to be cleared when casting to uint16_t (it would bring us back to
total 8 arithmetic operations). However in the case if the result
of a8r8g8b8->r5g6b5 conversion is immediately stored to memory, no
extra instructions for clearing these garbage bits are needed.

This allows the a8r8g8b8->r5g6b5 conversion code to be compiled
into 4 instructions for ARM instead of 5 (assuming a good optimizing
compiler), which has no pipeline stalls on ARM11 as an additional
bonus.

The change in benchmark results for 'lowlevel-blt-bench src_8888_0565'
with PIXMAN_DISABLE="arm-simd arm-neon mips-dspr2 mmx sse2" and pixman
compiled by gcc-4.7.2:

    MIPS 74K        480MHz  :  40.44 MPix/s ->  40.13 MPix/s
    ARM11           700MHz  :  50.28 MPix/s ->  62.85 MPix/s
    ARM Cortex-A8  1000MHz  : 124.38 MPix/s -> 141.85 MPix/s
    ARM Cortex-A15 1700MHz  : 281.07 MPix/s -> 303.29 MPix/s
    Intel Core i7  2800MHz  : 515.92 MPix/s -> 531.16 MPix/s

The same trick was used in xomap (X server for Nokia N800/N810):
    http://repository.maemo.org/pool/diablo/free/x/xorg-server/
    xorg-server_1.3.99.0~git20070321-0osso20083801.tar.gz
---
 pixman/pixman-private.h |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
index 422f72c..a7574c0 100644
--- a/pixman/pixman-private.h
+++ b/pixman/pixman-private.h
@@ -886,9 +886,13 @@ pixman_list_move_to_front (pixman_list_t *list, pixman_link_t *link)
 static force_inline uint16_t
 convert_8888_to_0565 (uint32_t s)
 {
-    return ((((s) >> 3) & 0x001f) |
-            (((s) >> 5) & 0x07e0) |
-            (((s) >> 8) & 0xf800));
+    /* The following code can be compiled into just 4 instructions on ARM */
+    uint32_t a, b;
+    a = (s >> 3) & 0x1F001F;
+    b = s & 0xFC00;
+    a |= a >> 5;
+    a |= b >> 5;
+    return (uint16_t)a;
 }
 
 static force_inline uint32_t
-- 
1.7.8.6



More information about the Pixman mailing list