[Pixman] [PATCH] sse2: Using MMX and SSE 4.1

Sun Jun 17 11:06:54 PDT 2012

On Sun, Jun 17, 2012 at 8:27 AM, Bill Spitzak <spitzak at gmail.com> wrote:
> On 06/16/2012 07:08 AM, Siarhei Siamashka wrote:
>
>>> An alternative idea is instead of changing the algorithm across the
>>> board, we could stop requiring bit exact results. The main piece of work
>>> here is to change the test suite so that it will accept pixels up to
>>> some maximum relative error. There is already some support for this: the
>>> 'composite' test is using the 'pixel_checker_t" to do compare the pixman
>>> output with perfect pixels computed in double precision.
>>>
>>> This latter idea is ultimately more useful because it will allow much
>>> more flexibility in the kinds of SIMD instruction sets we can support.
>>
>>
>> This is also a very useful test, but it effectively requires to have
>> an alternative double precision implementation for all the pixman
>> functionality to be verified.
>
>
> I don't understand this.

The 'composite' test alone has limited utility. It checks the
correctness of composite operations performed with just a single
pixel. But in order to provide better coverage for the functionality
used by real applications, we also must test different image sizes
(inner loops of composite functions do unrolling, and bugs may be
potentially introduced both in the main loop body and in the handling
of leading/trailing pixels). Additionally, when skipping fully
transparent pixels, SIMD optimized code skips the whole groups of
them, etc. There are lots of corner cases which need to be checked.
But it's easier to demonstrate by using an example. Let's try to add a
bug in 'sse2_combine_add_u' function:

diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index 70f8b77..fbea4f6 100644
--- a/pixman/pixman-sse2.c
+++ b/pixman/pixman-sse2.c
@@ -1348,7 +1348,7 @@ sse2_combine_add_u (pixman_implementation_t *imp,
     {
 	__m128i s;

-	s = combine4 ((__m128i*)ps, (__m128i*)pm);
+	s = _mm_setzero_si128 ();

 	save_128_aligned (
 	    (__m128i*)pd, _mm_adds_epu8 (s, load_128_aligned  ((__m128i*)pd)));

The patch above just introduces a bug into the code from "while (w >=
4)" loop. Let's see how it is handled by pixman test suite:

PASS: a1-trap-test
PASS: pdf-op-test
PASS: region-test
PASS: region-translate-test
PASS: fetch-test
PASS: oob-test
PASS: trap-crasher
PASS: alpha-loop
PASS: scaling-crash-test
PASS: scaling-helpers-test
PASS: gradient-crash-test
region_contains test passed (checksum=D2BF8C73)
PASS: region-contains-test

Wrong alpha value at (0, 0). Should be 0xff; got 0xf7. Source was
0x65, original dest was 0xf7
src: a8r8g8b8, alpha: none, origin 0 0
dst: a8r8g8b8, alpha: none, origin: 0 0

FAIL: alphamap
PASS: stress-test
composite traps test failed! (checksum=BE93DA05, expected E3112106)
FAIL: composite-traps-test
blitters test failed! (checksum=C8682A01, expected A364B5BF)
FAIL: blitters-test
glyph test failed! (checksum=B1B638A1, expected 1B7696A2)
FAIL: glyph-test
scaling test failed! (checksum=64788A7E, expected 80DF1CB2)
FAIL: scaling-test
affine test passed (checksum=1EF2175A)
PASS: affine-test
PASS: composite
=============================================
5 of 20 tests failed

The 'composite' test did not detect anything wrong as expected. Now
let's break the same 'sse2_combine_add_u' function completely by
inserting "return" into its very beginning:

diff --git a/pixman/pixman-sse2.c b/pixman/pixman-sse2.c
index 70f8b77..25c7aa0 100644
--- a/pixman/pixman-sse2.c
+++ b/pixman/pixman-sse2.c
@@ -1331,6 +1331,8 @@ sse2_combine_add_u (pixman_implementation_t *imp,
     const uint32_t* ps = src;
     const uint32_t* pm = mask;

+    return;
+
     while (w && (unsigned long)pd & 15)
     {
        s = combine1 (ps, pm);

Now 'composite' test can see that there is a problem:

PASS: a1-trap-test
PASS: pdf-op-test
PASS: region-test
PASS: region-translate-test
PASS: fetch-test
PASS: oob-test
PASS: trap-crasher
PASS: alpha-loop
PASS: scaling-crash-test
PASS: scaling-helpers-test
PASS: gradient-crash-test
region_contains test passed (checksum=D2BF8C73)
PASS: region-contains-test

Wrong alpha value at (0, 0). Should be 0xff; got 0xf7. Source was
0x65, original dest was 0xf7
src: a8r8g8b8, alpha: none, origin 0 0
dst: a8r8g8b8, alpha: none, origin: 0 0

FAIL: alphamap
PASS: stress-test
composite traps test failed! (checksum=4B0E22E6, expected E3112106)
FAIL: composite-traps-test
blitters test failed! (checksum=E95FFC20, expected A364B5BF)
FAIL: blitters-test
glyph test failed! (checksum=FDF0BD54, expected 1B7696A2)
FAIL: glyph-test
scaling test failed! (checksum=55981EC2, expected 80DF1CB2)
FAIL: scaling-test
affine test passed (checksum=1EF2175A)
PASS: affine-test
---- Test 3145752 failed ----
Operator:      ADD
---- Test 4194328 failed ----
Operator:      ADD
Source:        r3g3b2, 1x1
Destination:   x4r4g4b4, 1x1

Source:        a1r1g1b1, 1x1
Destination:   a2r2g2b2, 1x1

               R     G     B     A         Rounded
Source color:  1.000 1.000 1.000 0.000     1.000 1.000 1.000 0.000
               R     G     B     A         Rounded
Dest. color:   0.000 0.000 0.000 1.000     0.000 0.000 0.000 1.000
Expected:      1.000 1.000 1.000 1.000
Got:               0     0     0     3  [pixel: 0x000000c0]
Min accepted:      3     3     3     3
Max accepted:      3     3     3     3
Test 0x00300018 failed.
Source color:  1.000 1.000 1.000 0.000     1.000 1.000 1.000 1.000
Dest. color:   0.000 0.000 0.000 1.000     0.000 0.000 0.000 1.000
Expected:      1.000 1.000 1.000 1.000
Got:               0     0     0     0  [pixel: 0x00000000]
Min accepted:     15    15    15     0
Max accepted:     15    15    15     0
Test 0x00400018 failed.
FAIL: composite
=============================================
6 of 20 tests failed

And it's just a single corner case. I still believe that in order to
verify correctness for *all* pixman functionality, the reference
implementation must also implement *all* pixman functionality.
Otherwise we are wide open for all kind of bugs. Also
http://cgit.freedesktop.org/xorg/app/rendercheck has limited utility
for exactly the same reason. No surprise that using xrender in a bit
different way in the recent version of cairo suddenly exposes various
bugs:
    https://bugs.freedesktop.org/show_bug.cgi?id=47266

If we go for not requiring exact results for bilinear scaling and
different implementations diverge, then a lot of current pixman tests
just become useless and need a viable replacement. IMHO just changing
to use the same 7-bit precision for bilinear interpolation everywhere
is much easier, because it can be done now with little effort. In the
long term the tests surely are better to be improved.

-- 
Best regards,
Siarhei Siamashka