[Mesa-dev] [PATCH] mesa: Add missing _mm_mfence() before streaming loads.

Nuzman, Joseph joseph.nuzman at intel.com
Sat Oct 3 23:20:20 PDT 2015


You need to ensure that the streaming loads do not execute speculatively too early (before the data to be copied has been produced).  The MFENCE at the start of the copy routine is sufficient to ensure that the loads don't see stale data older than that fence.

All loads must be completed at the time of their retirement.  There are no special ordering requirements to ensure completion of the streaming loads.

Joe


-----Original Message-----
From: Justen, Jordan L 
Sent: Thursday, October 01, 2015 21:20
To: Matt Turner; mesa-dev at lists.freedesktop.org
Cc: Nuzman, Joseph
Subject: Re: [Mesa-dev] [PATCH] mesa: Add missing _mm_mfence() before streaming loads.

On 2015-10-01 10:11:33, Matt Turner wrote:
> According to the Intel Software Development Manual:

How about a more specific doc location?

According to the Intel Software Development Manual (Volume 1: Basic Architecture, 12.10.3 Streaming Load Hint Instruction):

>    Streaming loads may be weakly ordered and may appear to software to
>    execute out of order with respect to other memory operations.
>    Software must explicitly use fences (e.g. MFENCE) if it needs to
>    preserve order among streaming loads or between streaming loads and
>    other memory operations.

Does this mean we need a mfence following the load as well?

-Jordan

> That is, a memory fence is needed to preserve the order between the 
> GPU writing the buffer and the streaming loads reading it back.
> 
> Reported-by: Joseph Nuzman <joseph.nuzman at intel.com>
> ---
>  src/mesa/main/streaming-load-memcpy.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/src/mesa/main/streaming-load-memcpy.c 
> b/src/mesa/main/streaming-load-memcpy.c
> index d7147af..32854b6 100644
> --- a/src/mesa/main/streaming-load-memcpy.c
> +++ b/src/mesa/main/streaming-load-memcpy.c
> @@ -54,16 +54,19 @@ _mesa_streaming_load_memcpy(void *restrict dst, 
> void *restrict src, size_t len)
>  
>        memcpy(d, s, MIN2(bytes_before_alignment_boundary, len));
>  
>        d = (char *)ALIGN((uintptr_t)d, 16);
>        s = (char *)ALIGN((uintptr_t)s, 16);
>        len -= MIN2(bytes_before_alignment_boundary, len);
>     }
>  
> +   if (len >= 64)
> +      _mm_mfence();
> +
>     while (len >= 64) {
>        __m128i *dst_cacheline = (__m128i *)d;
>        __m128i *src_cacheline = (__m128i *)s;
>  
>        __m128i temp1 = _mm_stream_load_si128(src_cacheline + 0);
>        __m128i temp2 = _mm_stream_load_si128(src_cacheline + 1);
>        __m128i temp3 = _mm_stream_load_si128(src_cacheline + 2);
>        __m128i temp4 = _mm_stream_load_si128(src_cacheline + 3);
> --
> 2.4.9
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


More information about the mesa-dev mailing list