[Intel-gfx] [PATCH igt] igt/gem_fence_thresh: Use streaming reads for verify

Mon Oct 9 13:36:27 UTC 2017

Title: s/thresh/thrash/

On Wed, 2017-08-23 at 13:55 +0100, Chris Wilson wrote:
> At the moment, the verify tests use an extremely brutal write-read of
> every dword, degrading performance to UC. If we break those up into
> cachelines, we can do a wcb write/read at a time instead, roughly 8x
> faster. We lose the accuracy of the forced wcb flushes around every dword,
> but we are retaining the overall behaviour of checking reads following
> writes instead. To compensate, we do check that a single dword write/read
> before using wcb aligned accesses.
> 
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>

<SNIP>

> @@ -104,15 +109,78 @@ bo_copy (void *_arg)
>  	return NULL;
>  }
>  
> +#if defined(__x86_64__) && !defined(__clang__)
> +#define MOVNT 512
> +
> +#pragma GCC push_options
> +#pragma GCC target("sse4.1")
> +
> +#include <smmintrin.h>
> +__attribute__((noinline))
> +static void copy_wc_page(void *dst, void *src)
> +{
> +	if (igt_x86_features() & SSE4_1) {
> +		__m128i *S = (__m128i *)src;
> +		__m128i *D = (__m128i *)dst;
> +
> +		for (int i = 0; i < PAGE_SIZE/CACHELINE; i++) {
> +			__m128i tmp[4];
> +
> +			tmp[0] = _mm_stream_load_si128(S++);
> +			tmp[1] = _mm_stream_load_si128(S++);
> +			tmp[2] = _mm_stream_load_si128(S++);
> +			tmp[3] = _mm_stream_load_si128(S++);
> +
> +			_mm_store_si128(D++, tmp[0]);
> +			_mm_store_si128(D++, tmp[1]);
> +			_mm_store_si128(D++, tmp[2]);
> +			_mm_store_si128(D++, tmp[3]);
> +		}
> +	} else
> +		memcpy(dst, src, PAGE_SIZE);
> +}

Not lib/ material?

Add newline anyway.

Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation