[Pixman] [PATCH 1/2] vmx: align destination to fix valgrind invalid memory writes

Søren Sandmann sandmann at cs.au.dk
Sat Sep 28 14:52:36 PDT 2013


Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> The SIMD optimized inner loops in the VMX/Altivec code are trying
> to emulate unaligned accesses to the destination buffer. For each
> 4 pixels (which fit into a 128-bit register) the current
> implementation:
>   1. first performs two aligned reads, which cover the needed data
>   2. reshuffles bytes to get the needed data in a single vector register
>   3. does all the necessary calculations
>   4. reshuffles bytes back to their original location in two registers
>   5. performs two aligned writes back to the destination buffer
>
> Unfortunately in the case if the destination buffer is unaligned and
> the width is a perfect multiple of 4 pixels, we may have some writes
> crossing the boundaries of the destination buffer. In a multithreaded
> environment this may potentially corrupt the data outside of the
> destination buffer if it is concurrently read and written by some
> other thread.
>
> It is the primary suspect for the "make check" failure on power7 hardware:
>     http://lists.freedesktop.org/archives/pixman/2013-August/002871.html
>
> The valgrind report for blitters-test is full of:
>
> ==23085== Invalid write of size 8
> ==23085==    at 0x1004B0B4: vmx_combine_add_u (pixman-vmx.c:1089)
> ==23085==    by 0x100446EF: general_composite_rect (pixman-general.c:214)
> ==23085==    by 0x10002537: test_composite (blitters-test.c:363)
> ==23085==    by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
> ==23085==    by 0x10004943: fuzzer_test_main (utils.c:728)
> ==23085==    by 0x10002C17: main (blitters-test.c:397)
> ==23085==  Address 0x5188218 is 0 bytes after a block of size 88 alloc'd
> ==23085==    at 0x4051DA0: memalign (vg_replace_malloc.c:581)
> ==23085==    by 0x4051E7B: posix_memalign (vg_replace_malloc.c:709)
> ==23085==    by 0x10004CFF: aligned_malloc (utils.c:833)
> ==23085==    by 0x10001DCB: create_random_image (blitters-test.c:47)
> ==23085==    by 0x10002263: test_composite (blitters-test.c:283)
> ==23085==    by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
> ==23085==    by 0x10004943: fuzzer_test_main (utils.c:728)
> ==23085==    by 0x10002C17: main (blitters-test.c:397)
>
> This patch addresses the problem by first aligning the destination
> buffer at a 16 byte boundary in each combiner function. This trick
> is borrowed from the pixman SSE2 code.

Looks good to me.

The patches here:

   http://lists.freedesktop.org/archives/pixman/2013-September/002976.html

introduce a new thread-test program that is intended to exercise the
bug, and I verified that if a similar issue is introduced on x86:

    http://people.freedesktop.org/~sandmann/0001-HACK-introduce-bug-in-core_combine_over_u_sse2_no_ma.patch

thread-test does detect it (whereas blitters-test doesn't). However, I
still don't have a PowerPC (let alone a Power7), so I'm not sure if it
detects the bug there. If it does, though, it may be worthwhile rebasing
your fixes on top of it to verify that they do indeed fix the bug.


Søren


More information about the Pixman mailing list