[Pixman] [PATCH 1/2] vmx: align destination to fix valgrind invalid memory writes
Siarhei Siamashka
siarhei.siamashka at gmail.com
Sat Sep 28 18:29:32 PDT 2013
On Sat, 28 Sep 2013 23:52:36 +0200
sandmann at cs.au.dk (Søren Sandmann) wrote:
> Siarhei Siamashka <siarhei.siamashka at gmail.com> writes:
>
> > The SIMD optimized inner loops in the VMX/Altivec code are trying
> > to emulate unaligned accesses to the destination buffer. For each
> > 4 pixels (which fit into a 128-bit register) the current
> > implementation:
> > 1. first performs two aligned reads, which cover the needed data
> > 2. reshuffles bytes to get the needed data in a single vector register
> > 3. does all the necessary calculations
> > 4. reshuffles bytes back to their original location in two registers
> > 5. performs two aligned writes back to the destination buffer
> >
> > Unfortunately in the case if the destination buffer is unaligned and
> > the width is a perfect multiple of 4 pixels, we may have some writes
> > crossing the boundaries of the destination buffer. In a multithreaded
> > environment this may potentially corrupt the data outside of the
> > destination buffer if it is concurrently read and written by some
> > other thread.
> >
> > It is the primary suspect for the "make check" failure on power7 hardware:
> > http://lists.freedesktop.org/archives/pixman/2013-August/002871.html
> >
> > The valgrind report for blitters-test is full of:
> >
> > ==23085== Invalid write of size 8
> > ==23085== at 0x1004B0B4: vmx_combine_add_u (pixman-vmx.c:1089)
> > ==23085== by 0x100446EF: general_composite_rect (pixman-general.c:214)
> > ==23085== by 0x10002537: test_composite (blitters-test.c:363)
> > ==23085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
> > ==23085== by 0x10004943: fuzzer_test_main (utils.c:728)
> > ==23085== by 0x10002C17: main (blitters-test.c:397)
> > ==23085== Address 0x5188218 is 0 bytes after a block of size 88 alloc'd
> > ==23085== at 0x4051DA0: memalign (vg_replace_malloc.c:581)
> > ==23085== by 0x4051E7B: posix_memalign (vg_replace_malloc.c:709)
> > ==23085== by 0x10004CFF: aligned_malloc (utils.c:833)
> > ==23085== by 0x10001DCB: create_random_image (blitters-test.c:47)
> > ==23085== by 0x10002263: test_composite (blitters-test.c:283)
> > ==23085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733)
> > ==23085== by 0x10004943: fuzzer_test_main (utils.c:728)
> > ==23085== by 0x10002C17: main (blitters-test.c:397)
> >
> > This patch addresses the problem by first aligning the destination
> > buffer at a 16 byte boundary in each combiner function. This trick
> > is borrowed from the pixman SSE2 code.
>
> Looks good to me.
>
> The patches here:
>
> http://lists.freedesktop.org/archives/pixman/2013-September/002976.html
>
> introduce a new thread-test program that is intended to exercise the
> bug, and I verified that if a similar issue is introduced on x86:
>
> http://people.freedesktop.org/~sandmann/0001-HACK-introduce-bug-in-core_combine_over_u_sse2_no_ma.patch
>
> thread-test does detect it (whereas blitters-test doesn't). However, I
> still don't have a PowerPC (let alone a Power7), so I'm not sure if it
> detects the bug there. If it does, though, it may be worthwhile rebasing
> your fixes on top of it to verify that they do indeed fix the bug.
Thanks, that was an interesting test. But in its current form, it does
not seem to detect the bug yet.
The blitters-test is using aligned_malloc, which makes it unlikely to
hit this problem. I also tried to experiment with jemalloc using
some tuning and also setting OMP_NUM_THREADS=100 when running
affine-test (it uses malloc), but could not reproduce anything
either.
--
Best regards,
Siarhei Siamashka
More information about the Pixman
mailing list