[Pixman] [PATCH 1/2] vmx: align destination to fix valgrind invalid memory writes

Siarhei Siamashka siarhei.siamashka at gmail.com
Sun Sep 29 03:27:20 PDT 2013


On Sun, 29 Sep 2013 04:02:21 +0200
sandmann at cs.au.dk (Søren Sandmann) wrote:

> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> 
> > On Sat, 28 Sep 2013 23:52:36 +0200
> > sandmann at cs.au.dk (Søren Sandmann) wrote:
> >
> >> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
> >> 
> >> > The SIMD optimized inner loops in the VMX/Altivec code are trying
> >> > to emulate unaligned accesses to the destination buffer. For each
> >> > 4 pixels (which fit into a 128-bit register) the current
> >> > implementation:
> >> >   1. first performs two aligned reads, which cover the needed data
> >> >   2. reshuffles bytes to get the needed data in a single vector register
> >> >   3. does all the necessary calculations
> >> >   4. reshuffles bytes back to their original location in two registers
> >> >   5. performs two aligned writes back to the destination buffer
> >> >
> >> > Unfortunately in the case if the destination buffer is unaligned and
> >> > the width is a perfect multiple of 4 pixels, we may have some writes
> >> > crossing the boundaries of the destination buffer. In a multithreaded
> >> > environment this may potentially corrupt the data outside of the
> >> > destination buffer if it is concurrently read and written by some
> >> > other thread.
> >> >
> >> > It is the primary suspect for the "make check" failure on power7 hardware:
> >> >     http://lists.freedesktop.org/archives/pixman/2013-August/002871.html
> >> >
> >> > The valgrind report for blitters-test is full of:
> >> >
> >> > =#085== Invalid write of size 8
> >> > =#085==    at 0x1004B0B4: vmx_combine_add_u (pixman-vmx.c:1089)
> >> > =#085==    by 0x100446EF: general_composite_rect (pixman-general.c:214)
> >> > =#085==    by 0x10002537: test_composite (blitters-test.c:363)
> >> > =#085==    by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
> >> > =#085==    by 0x10004943: fuzzer_test_main (utils.c:728)
> >> > =#085==    by 0x10002C17: main (blitters-test.c:397)
> >> > =#085==  Address 0x5188218 is 0 bytes after a block of size 88 alloc'd
> >> > =#085==    at 0x4051DA0: memalign (vg_replace_malloc.c:581)
> >> > =#085==    by 0x4051E7B: posix_memalign (vg_replace_malloc.c:709)
> >> > =#085==    by 0x10004CFF: aligned_malloc (utils.c:833)
> >> > =#085==    by 0x10001DCB: create_random_image (blitters-test.c:47)
> >> > =#085==    by 0x10002263: test_composite (blitters-test.c:283)
> >> > =#085==    by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
> >> > =#085==    by 0x10004943: fuzzer_test_main (utils.c:728)
> >> > =#085==    by 0x10002C17: main (blitters-test.c:397)
> >> >
> >> > This patch addresses the problem by first aligning the destination
> >> > buffer at a 16 byte boundary in each combiner function. This trick
> >> > is borrowed from the pixman SSE2 code.
> >> 
> >> Looks good to me.
> >> 
> >> The patches here:
> >> 
> >>    http://lists.freedesktop.org/archives/pixman/2013-September/002976.html
> >> 
> >> introduce a new thread-test program that is intended to exercise the
> >> bug, and I verified that if a similar issue is introduced on x86:
> >> 
> >>     http://people.freedesktop.org/~sandmann/0001-HACK-introduce-bug-in-core_combine_over_u_sse2_no_ma.patch
> >> 
> >> thread-test does detect it (whereas blitters-test doesn't). However, I
> >> still don't have a PowerPC (let alone a Power7), so I'm not sure if it
> >> detects the bug there. If it does, though, it may be worthwhile rebasing
> >> your fixes on top of it to verify that they do indeed fix the bug.
> >
> > Thanks, that was an interesting test. But in its current form, it does
> > not seem to detect the bug yet.
> 
> Okay, new version to follow that incorporates your endianness and
> undefinedness fixes plus uses wider compositing.
> 
> Also available in the thread-test branch of my repository.

Now the thread-test does the job as expected (fails with the old
vmx code and passes after the fixes to vmx are applied).

-- 
Best regards,
Siarhei Siamashka


More information about the Pixman mailing list