[Mesa-dev] [PATCH 09/13] i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear

Chris Wilson chris at chris-wilson.co.uk
Fri May 25 23:04:35 UTC 2018


Quoting Scott D Phillips (2018-04-30 18:25:48)
> +#if defined(USE_SSE41)
> +static ALWAYS_INLINE void *
> +_memcpy_streaming_load(void *dest, const void *src, size_t count)
> +{
> +   if (count == 16) {
> +      __m128i val = _mm_stream_load_si128((__m128i *)src);
> +      _mm_store_si128((__m128i *)dest, val);
> +      return dest;
> +   } else if (count == 64) {
> +      __m128i val0 = _mm_stream_load_si128(((__m128i *)src) + 0);
> +      __m128i val1 = _mm_stream_load_si128(((__m128i *)src) + 1);
> +      __m128i val2 = _mm_stream_load_si128(((__m128i *)src) + 2);
> +      __m128i val3 = _mm_stream_load_si128(((__m128i *)src) + 3);
> +      _mm_store_si128(((__m128i *)dest) + 0, val0);
> +      _mm_store_si128(((__m128i *)dest) + 1, val1);
> +      _mm_store_si128(((__m128i *)dest) + 2, val2);
> +      _mm_store_si128(((__m128i *)dest) + 3, val3);
> +      return dest;

I didn't spot this before, but we use this to copy from an aligned
(tiled) source to an unaligned user buffer.

s/_mm_store_si128/_mm_storeu_si128/
                           ^ very important :)
-Chris


More information about the mesa-dev mailing list