[Intel-gfx] [PATCH] drm/i915: Use SSE4.1 movntdqa to accelerate reads from WC memory

Chris Wilson chris at chris-wilson.co.uk
Mon Jul 18 11:35:01 UTC 2016


On Mon, Jul 18, 2016 at 12:15:32PM +0100, Tvrtko Ursulin wrote:
> I am not sure about this, but looking at the raid6 for example, it
> has a lot more annotations in cases like this.
> 
> It seems to be telling the compiler which memory ranges does each
> instruction access, and also uses "asm volatile" - whether or not
> that is really needed I don't know.
> 
> For example:
>                 asm volatile("movdqa %0,%%xmm4" :: "m" (dptr[z0][d]));
>
> And:
>                 asm volatile("movdqa %%xmm4,%0" : "=m" (q[d]));
> 
> Each one is telling the compiler the instruction is either reading
> or writing respectively from a certain memory address.
> 
> You don't have any of that, and don't even specify nothing as an
> output parameter so I am not sure if your code is safe.

The asm is correct. We do not modify either of the two pointers which we
pass in via register inputs, but the memory behind them - hence the memory
clobber.

> >+void i915_memcpy_init_early(struct drm_i915_private *dev_priv)
> >+{
> >+	if (static_cpu_has(X86_FEATURE_XMM4_1))
> >+		static_branch_enable(&has_movntdqa);
> >+}
> >
> 
> I was not familiar with static key stuff and the only thing I can
> notice is that it is used very little throughout the kernel. On the
> other hand I haven't found any references in the documentation that
> it should be used sparingly or something.
> 
> But the general question would be - is it worth it here? Static
> branches should be really efficient in the off case, correct? And we
> don't really care about the performance of the off case here. So
> would it be just as good to use a normal branch?

It's not the cost of the branch, but the static_cpu_has() in comparison
to a small copy.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre


More information about the Intel-gfx mailing list