[Intel-gfx] [PATCH 18/20] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory

Fri Aug 12 12:22:45 UTC 2016

On Fri, Aug 12, 2016 at 11:54:04AM +0100, Tvrtko Ursulin wrote:
> On 12/08/16 07:25, akash.goel at intel.com wrote:
> >From: Chris Wilson <chris at chris-wilson.co.uk>
> >
> >This patch provides the infrastructure for performing a 16-byte aligned
> >read from WC memory using non-temporal instructions introduced with sse4.1.
> >Using movntdqa we can bypass the CPU caches and read directly from memory
> >and ignoring the page attributes set on the CPU PTE i.e. negating the
> >impact of an otherwise UC access. Copying using movntqda from WC is almost
> >as fast as reading from WB memory, modulo the possibility of both hitting
> >the CPU cache or leaving the data in the CPU cache for the next consumer.
> >(The CPU cache itself my be flushed for the region of the movntdqa and on
> >later access the movntdqa reads from a separate internal buffer for the
> >cacheline.) The write back to the memory is however cached.
> >
> >This will be used in later patches to accelerate accessing WC memory.
> >
> >v2: Report whether the accelerated copy is successful/possible.
> >v3: Function alignment override was only necessary when using the
> >function target("sse4.1") - which is not necessary for emitting movntdqa
> >from __asm__.
> >v4: Improve notes on CPU cache behaviour vs non-temporal stores.
> >v5: Fix byte offsets for unrolled moves.
> >v6: Find all remaining typos of movntqda, use kernel_fpu_begin.
> >
> >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> >Cc: Akash Goel <akash.goel at intel.com>
> >Cc: Damien Lespiau <damien.lespiau at intel.com>
> >Cc: Mika Kuoppala <mika.kuoppala at intel.com>
> >Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>

Picked up the 2 WC prep patches. Thanks for the review, testing and
improvements,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre