[Intel-gfx] [CI 2/2] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory

Chris Wilson chris at chris-wilson.co.uk
Fri Aug 12 12:42:24 UTC 2016


On Fri, Aug 12, 2016 at 03:30:55PM +0300, Ville Syrjälä wrote:
> On Fri, Aug 12, 2016 at 12:39:59PM +0100, Chris Wilson wrote:
> > +#ifdef CONFIG_AS_MOVNTDQA
> > +static void __memcpy_ntdqa(void *dst, const void *src, unsigned long len)
> > +{
> > +	kernel_fpu_begin();
> > +
> > +	len >>= 4;
> > +	while (len >= 4) {
> > +		asm("movntdqa   (%0), %%xmm0\n"
> > +		    "movntdqa 16(%0), %%xmm1\n"
> > +		    "movntdqa 32(%0), %%xmm2\n"
> > +		    "movntdqa 48(%0), %%xmm3\n"
> > +		    "movaps %%xmm0,   (%1)\n"
> > +		    "movaps %%xmm1, 16(%1)\n"
> > +		    "movaps %%xmm2, 32(%1)\n"
> > +		    "movaps %%xmm3, 48(%1)\n"
> 
> Not using sse2 movntdq for the store? No benefit or?

At least in the scenarios we, ok I, have in mind, leaving the dst in the
cache benefits us as we immediately process/move the data on.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list