[Pixman] testsuite fails on power7

Thu Aug 29 13:23:49 PDT 2013

On Thu, Aug 29, 2013 at 10:56:49PM +0300, Siarhei Siamashka wrote:
> On Thu, 29 Aug 2013 15:18:57 -0400
> "Lennart Sorensen" <lsorense at csclub.uwaterloo.ca> wrote:
> 
> > I get crashes in the scaling and affinity tests on power7.  The crashes
> > are always in the vmx code, so building with vmx support disabled makes
> > the problem go away.
> > 
> > The error is not consistent, so my current guess is that multiple threads
> > are running and depending on timing one thread manages to sometimes
> > corrupt another and cause it to fail.
> > 
> > As far as I can tell, it doesn't fail on power5 or power6 machines,
> > but given the interesting memory model of the powerpc and requirement
> > for explicit syncs and barriers to ensure things have really made it to
> > memory and other CPUs, the power7 has managed to show up bugs in glibc
> > and gcc already where power5 and power6 and other powerpc systems never
> > failed before.
> > 
> > Any suggestions on how to debug this or where to look?  Any traces or
> > logs that would be helpful?
> > 
> > I am currently using version 0.26.0-4 debian package on Debian 7 (wheezy).
> > 
> > Interestingly, if I change the version of libc to 2.17 instead of 2.13
> > that wheezy is using, then the problem also disappears, but again, this
> > might just be a timing change causing this, or perhaps there is something
> > relevant changed in the newer libc, although I haven't spotted anything
> > suspicious looking when doing a diff so far.
> 
> VMX/Altivec is a bit tricky because all the vector load/store
> operations must be aligned. For the unaligned reads/writes, pixman
> seems to use the LOAD_VECTORS and STORE_VECTOR macros:

My understanding was that vec_lda must be aligned but vec_ld does not
have to be aligned.

>     http://cgit.freedesktop.org/pixman/tree/pixman/pixman-vmx.c?id=pixman-0.30.2#n151
> 
> The STORE_VECTOR macro is particularly interesting because it performs
> two stores. We can have a look at the typical combiner function, such
> as "vmx_combine_over_u_no_mask":
> 
>     http://cgit.freedesktop.org/pixman/tree/pixman/pixman-vmx.c?id=pixman-0.30.2#n187
> 
> In the case if the destination buffer is unaligned and the width is a
> perfect multiple of 4 pixels, I believe that we may have some writes
> crossing the boundaries of the destination buffer.
> 
> Is suspect that it just reads the data outside the destination buffer,
> modifies the parts which really belong to the destination image and
> writes everything back (so that the chunk of memory outside the
> destination buffer is restored by the STORE_VECTOR macro to the value
> that it had at the time of LOAD_VECTORS invocation). Without heavy
> multithreading this kinda works just fine. But with many concurrent
> threads, the chunk of data beyond the destination buffer may be
> possibly actively used by some other thread, creating a race condition.
> 
> That was just a guess based on the quick look at the pixman vmx code.
> You can possibly try to experiment with overriding malloc by something
> that allocates memory blocks with 16 bytes granularity (for both the
> starting address and size). This would make sure that each 16 bytes
> aligned memory chunk is never shared by multiple threads. If the crashes
> disappear, then that's probably it. And the libc 2.17 might be perhaps
> enforcing something like this.

Well running under valgrind shows that sometimes the LOAD_VECTORS and
STORE_VECTOR do read and write outside the malloc area.  I tried just
making the malloc get 16 bytes extra, but that did not solve the issue.
It seems it has to be something more complicated than that.

I am not sure if the vec_ld is implemented in the compiler or libc,
and I can't remember if I still used the same gcc version when testing
with libc 2.17.  I am using gcc 4.6 from Debian wheezy at the moment.
I am pretty sure I tried with 4.7 as well with no change in behaviour.

-- 
Len Sorensen