[Pixman] testsuite fails on power7

Thu Aug 29 12:56:49 PDT 2013

On Thu, 29 Aug 2013 15:18:57 -0400
"Lennart Sorensen" <lsorense at csclub.uwaterloo.ca> wrote:

> I get crashes in the scaling and affinity tests on power7.  The crashes
> are always in the vmx code, so building with vmx support disabled makes
> the problem go away.
> 
> The error is not consistent, so my current guess is that multiple threads
> are running and depending on timing one thread manages to sometimes
> corrupt another and cause it to fail.
> 
> As far as I can tell, it doesn't fail on power5 or power6 machines,
> but given the interesting memory model of the powerpc and requirement
> for explicit syncs and barriers to ensure things have really made it to
> memory and other CPUs, the power7 has managed to show up bugs in glibc
> and gcc already where power5 and power6 and other powerpc systems never
> failed before.
> 
> Any suggestions on how to debug this or where to look?  Any traces or
> logs that would be helpful?
> 
> I am currently using version 0.26.0-4 debian package on Debian 7 (wheezy).
> 
> Interestingly, if I change the version of libc to 2.17 instead of 2.13
> that wheezy is using, then the problem also disappears, but again, this
> might just be a timing change causing this, or perhaps there is something
> relevant changed in the newer libc, although I haven't spotted anything
> suspicious looking when doing a diff so far.

VMX/Altivec is a bit tricky because all the vector load/store
operations must be aligned. For the unaligned reads/writes, pixman
seems to use the LOAD_VECTORS and STORE_VECTOR macros:

    http://cgit.freedesktop.org/pixman/tree/pixman/pixman-vmx.c?id=pixman-0.30.2#n151

The STORE_VECTOR macro is particularly interesting because it performs
two stores. We can have a look at the typical combiner function, such
as "vmx_combine_over_u_no_mask":

    http://cgit.freedesktop.org/pixman/tree/pixman/pixman-vmx.c?id=pixman-0.30.2#n187

In the case if the destination buffer is unaligned and the width is a
perfect multiple of 4 pixels, I believe that we may have some writes
crossing the boundaries of the destination buffer.

Is suspect that it just reads the data outside the destination buffer,
modifies the parts which really belong to the destination image and
writes everything back (so that the chunk of memory outside the
destination buffer is restored by the STORE_VECTOR macro to the value
that it had at the time of LOAD_VECTORS invocation). Without heavy
multithreading this kinda works just fine. But with many concurrent
threads, the chunk of data beyond the destination buffer may be
possibly actively used by some other thread, creating a race condition.

That was just a guess based on the quick look at the pixman vmx code.
You can possibly try to experiment with overriding malloc by something
that allocates memory blocks with 16 bytes granularity (for both the
starting address and size). This would make sure that each 16 bytes
aligned memory chunk is never shared by multiple threads. If the crashes
disappear, then that's probably it. And the libc 2.17 might be perhaps
enforcing something like this.

-- 
Best regards,
Siarhei Siamashka