[PATCH 3/6] x86: Add support for the clwb instruction
Ross Zwisler
ross.zwisler at linux.intel.com
Tue Nov 11 11:48:52 PST 2014
On Tue, 2014-11-11 at 20:12 +0100, Borislav Petkov wrote:
> On Tue, Nov 11, 2014 at 11:43:13AM -0700, Ross Zwisler wrote:
> > Add support for the new clwb instruction. This instruction was
> > announced in the document "Intel Architecture Instruction Set Extensions
> > Programming Reference" with reference number 319433-022.
> >
> > https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf
> >
> > Here are some things of note:
> >
> > - As with the clflushopt patches before this, I'm assuming that the addressing
> > mode generated by the original clflush instruction will match the new
> > clflush instruction with the 0x66 prefix for clflushopt, and for the
> > xsaveopt instruction with the 0x66 prefix for clwb. For all the test cases
> > that I've come up with and for the new clwb code generated by this patch
> > series, this has proven to be true on my test machine.
> >
> > - According to the SDM, xsaveopt has a form where it has a REX.W prefix. I
> > believe that this prefix will not be generated by gcc in x86_64 kernel code.
> > Based on this, I don't believe I need to account for this extra prefix when
> > dealing with the assembly language created for clwb. Please correct me if
> > I'm wrong.
> >
> > Signed-off-by: Ross Zwisler <ross.zwisler at linux.intel.com>
> > Cc: H Peter Anvin <h.peter.anvin at intel.com>
> > Cc: Ingo Molnar <mingo at kernel.org>
> > Cc: Thomas Gleixner <tglx at linutronix.de>
> > Cc: David Airlie <airlied at linux.ie>
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: x86 at kernel.org
> > ---
> > arch/x86/include/asm/cpufeature.h | 1 +
> > arch/x86/include/asm/special_insns.h | 10 ++++++++++
> > 2 files changed, 11 insertions(+)
> >
> > diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> > index b3e6b89..fbbed34 100644
> > --- a/arch/x86/include/asm/cpufeature.h
> > +++ b/arch/x86/include/asm/cpufeature.h
> > @@ -227,6 +227,7 @@
> > #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */
> > #define X86_FEATURE_PCOMMIT ( 9*32+22) /* PCOMMIT instruction */
> > #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */
> > +#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */
> > #define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */
> > #define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */
> > #define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */
> > diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h
> > index 1709a2e..a328460 100644
> > --- a/arch/x86/include/asm/special_insns.h
> > +++ b/arch/x86/include/asm/special_insns.h
> > @@ -199,6 +199,16 @@ static inline void clflushopt(volatile void *__p)
> > "+m" (*(volatile char __force *)__p));
> > }
> >
> > +static inline void clwb(volatile void *__p)
> > +{
> > + alternative_io_2(".byte " __stringify(NOP_DS_PREFIX) "; clflush %P0",
>
> Any particular reason for using 0x3e as a prefix to have the insns be
> the same size or is it simply because CLFLUSH can stomach it?
>
> :-)
Essentially we need one additional byte at the beginning of the clflush so
that we can flip it into a clflushopt by changing that byte into a 0x66
prefix. Two options are to either insert a 1 byte ASM_NOP1, or to add a 1
byte NOP_DS_PREFIX. Both have no functional effect with the plain clflush,
but I've been told that executing a clflush + prefix should be faster than
executing a clflush + NOP.
More information about the dri-devel
mailing list