[Pixman] [PATCH 1/2] vmx: align destination to fix valgrind invalid memory writes

Søren Sandmann sandmann at cs.au.dk
Sat Sep 28 19:02:21 PDT 2013


Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:

> On Sat, 28 Sep 2013 23:52:36 +0200
> sandmann at cs.au.dk (Søren Sandmann) wrote:
>
>> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>> 
>> > The SIMD optimized inner loops in the VMX/Altivec code are trying
>> > to emulate unaligned accesses to the destination buffer. For each
>> > 4 pixels (which fit into a 128-bit register) the current
>> > implementation:
>> >   1. first performs two aligned reads, which cover the needed data
>> >   2. reshuffles bytes to get the needed data in a single vector register
>> >   3. does all the necessary calculations
>> >   4. reshuffles bytes back to their original location in two registers
>> >   5. performs two aligned writes back to the destination buffer
>> >
>> > Unfortunately in the case if the destination buffer is unaligned and
>> > the width is a perfect multiple of 4 pixels, we may have some writes
>> > crossing the boundaries of the destination buffer. In a multithreaded
>> > environment this may potentially corrupt the data outside of the
>> > destination buffer if it is concurrently read and written by some
>> > other thread.
>> >
>> > It is the primary suspect for the "make check" failure on power7 hardware:
>> >     http://lists.freedesktop.org/archives/pixman/2013-August/002871.html
>> >
>> > The valgrind report for blitters-test is full of:
>> >
>> > =#085== Invalid write of size 8
>> > =#085==    at 0x1004B0B4: vmx_combine_add_u (pixman-vmx.c:1089)
>> > =#085==    by 0x100446EF: general_composite_rect (pixman-general.c:214)
>> > =#085==    by 0x10002537: test_composite (blitters-test.c:363)
>> > =#085==    by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
>> > =#085==    by 0x10004943: fuzzer_test_main (utils.c:728)
>> > =#085==    by 0x10002C17: main (blitters-test.c:397)
>> > =#085==  Address 0x5188218 is 0 bytes after a block of size 88 alloc'd
>> > =#085==    at 0x4051DA0: memalign (vg_replace_malloc.c:581)
>> > =#085==    by 0x4051E7B: posix_memalign (vg_replace_malloc.c:709)
>> > =#085==    by 0x10004CFF: aligned_malloc (utils.c:833)
>> > =#085==    by 0x10001DCB: create_random_image (blitters-test.c:47)
>> > =#085==    by 0x10002263: test_composite (blitters-test.c:283)
>> > =#085==    by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
>> > =#085==    by 0x10004943: fuzzer_test_main (utils.c:728)
>> > =#085==    by 0x10002C17: main (blitters-test.c:397)
>> >
>> > This patch addresses the problem by first aligning the destination
>> > buffer at a 16 byte boundary in each combiner function. This trick
>> > is borrowed from the pixman SSE2 code.
>> 
>> Looks good to me.
>> 
>> The patches here:
>> 
>>    http://lists.freedesktop.org/archives/pixman/2013-September/002976.html
>> 
>> introduce a new thread-test program that is intended to exercise the
>> bug, and I verified that if a similar issue is introduced on x86:
>> 
>>     http://people.freedesktop.org/~sandmann/0001-HACK-introduce-bug-in-core_combine_over_u_sse2_no_ma.patch
>> 
>> thread-test does detect it (whereas blitters-test doesn't). However, I
>> still don't have a PowerPC (let alone a Power7), so I'm not sure if it
>> detects the bug there. If it does, though, it may be worthwhile rebasing
>> your fixes on top of it to verify that they do indeed fix the bug.
>
> Thanks, that was an interesting test. But in its current form, it does
> not seem to detect the bug yet.

Okay, new version to follow that incorporates your endianness and
undefinedness fixes plus uses wider compositing.

Also available in the thread-test branch of my repository.


Søren


More information about the Pixman mailing list