[Pixman] testsuite fails on power7

Lennart Sorensen lsorense at csclub.uwaterloo.ca
Tue Sep 17 07:03:11 PDT 2013


On Mon, Sep 16, 2013 at 10:24:10PM +0200, Søren Sandmann wrote:
> Lennart Sorensen <lsorense at csclub.uwaterloo.ca> writes:
> 
> > And to make it even more annoying to track down:
> >
> > It doesn't fail on a power6, only on a power7.  power7 machines are
> > known to have found numerous powerpc memory barrier bugs in code
> > (including compiler and library code), where earlier generations let
> > you get away with stuff that the architecture didn't actually allow,
> > but which usually worked.
> >
> > So it seems to be a bug trigged by: vmx code used with openmp with libc
> > 2.13 on power7.  Change any one of those 4 things, and the bug
> > vanishes.
> 
> As far as I can see, this is all still consistent with the bug being
> that the VMX combiners are writing outside the malloced memory:
> 
> - Disable OpenMP, and it doesn't matter because the same bytes are read
>   and written. With OpenMP, two threads can mess each other's memory up.
> 
> - Use different libc: malloc() may allocate different amounts of memory
>   so that the combiners don't write outside of the allocated area.
> 
> - Disable VMX: There is no writing outside the malloc()ed area
> 
> - Power 6: Could just be timing differences, but may also have to do
>   with different atomicity of the incorrect memory accesses.

The thing is that if I set the openmp environment to only use one thread,
the problem does NOT go away.  So either I did it wrong, or it's a bit
more tricky than that.

But perhaps it really is just that the unaligned vector operations are
smashing each other.

I did see one altivec document showing the use of vec_ste rather than
vec_st in a complex manner to do unaligned stores with a comment about
thread safety.  It may be that vec_st is NOT thread safe for unaligned
stores.

I did try increasing the malloc size by some extra margin amount, and
that did not help at all.  If it was just each thread going into the
range of the bytes used by another thread, I would not expect that to
corrupt libc's data structures for the malloc, which certainly seems to
be happening.  libc complains about double free or corrupt linked lists
on some runs (other times it just segfaults).

-- 
Len Sorensen


More information about the Pixman mailing list