[Intel-gfx] [PATCH igt] igt/gem_fence_thresh: Use streaming reads for verify
Joonas Lahtinen
joonas.lahtinen at linux.intel.com
Mon Oct 9 13:36:27 UTC 2017
Title: s/thresh/thrash/
On Wed, 2017-08-23 at 13:55 +0100, Chris Wilson wrote:
> At the moment, the verify tests use an extremely brutal write-read of
> every dword, degrading performance to UC. If we break those up into
> cachelines, we can do a wcb write/read at a time instead, roughly 8x
> faster. We lose the accuracy of the forced wcb flushes around every dword,
> but we are retaining the overall behaviour of checking reads following
> writes instead. To compensate, we do check that a single dword write/read
> before using wcb aligned accesses.
>
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
<SNIP>
> @@ -104,15 +109,78 @@ bo_copy (void *_arg)
> return NULL;
> }
>
> +#if defined(__x86_64__) && !defined(__clang__)
> +#define MOVNT 512
> +
> +#pragma GCC push_options
> +#pragma GCC target("sse4.1")
> +
> +#include <smmintrin.h>
> +__attribute__((noinline))
> +static void copy_wc_page(void *dst, void *src)
> +{
> + if (igt_x86_features() & SSE4_1) {
> + __m128i *S = (__m128i *)src;
> + __m128i *D = (__m128i *)dst;
> +
> + for (int i = 0; i < PAGE_SIZE/CACHELINE; i++) {
> + __m128i tmp[4];
> +
> + tmp[0] = _mm_stream_load_si128(S++);
> + tmp[1] = _mm_stream_load_si128(S++);
> + tmp[2] = _mm_stream_load_si128(S++);
> + tmp[3] = _mm_stream_load_si128(S++);
> +
> + _mm_store_si128(D++, tmp[0]);
> + _mm_store_si128(D++, tmp[1]);
> + _mm_store_si128(D++, tmp[2]);
> + _mm_store_si128(D++, tmp[3]);
> + }
> + } else
> + memcpy(dst, src, PAGE_SIZE);
> +}
Not lib/ material?
Add newline anyway.
Reviewed-by: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
Regards, Joonas
--
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
More information about the Intel-gfx
mailing list