[Mesa-dev] [PATCH] gallium/util: Fix detection of AVX cpu caps
Roland Scheidegger
sroland at vmware.com
Tue Jul 23 11:57:22 PDT 2013
Am 23.07.2013 19:08, schrieb Andre Heider:
> For AVX it's not sufficient to only rely on the cpuid flags. If the CPU
> supports these extensions, but the OS doesn't, issuing these insns will
> trigger an undefined opcode exception.
>
> In addition to the AVX cpuid bit we also need to:
> * test cpuid for OSXSAVE support
> * XGETBV to check if the OS saves/restores AVX regs on context switches
>
> See "Detecting Availability and Support" at
> http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
>
> Signed-off-by: Andre Heider <a.heider at gmail.com>
> ---
> src/gallium/auxiliary/util/u_cpu_detect.c | 27 +++++++++++++++++++++++++--
> 1 file changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c b/src/gallium/auxiliary/util/u_cpu_detect.c
> index b118fc8..588fc7c 100644
> --- a/src/gallium/auxiliary/util/u_cpu_detect.c
> +++ b/src/gallium/auxiliary/util/u_cpu_detect.c
> @@ -67,7 +67,7 @@
>
> #if defined(PIPE_OS_WINDOWS)
> #include <windows.h>
> -#if defined(MSVC)
> +#if defined(PIPE_CC_MSVC)
> #include <intrin.h>
> #endif
> #endif
> @@ -211,6 +211,27 @@ cpuid(uint32_t ax, uint32_t *p)
> p[3] = 0;
> #endif
> }
> +
> +static INLINE uint64_t xgetbv(void)
> +{
> +#if defined(PIPE_CC_GCC)
> + uint32_t eax, edx;
> +
> + __asm __volatile (
> + ".byte 0x0f, 0x01, 0xd0" // xgetbv isn't supported on gcc < 4.4
> + : "=a"(eax),
> + "=d"(edx)
> + : "c"(0)
> + );
> +
> + return ((uint64_t)edx << 32) | eax;
> +#elif defined(PIPE_CC_MSVC) && defined(_MSC_FULL_VER) && defined(_XCR_XFEATURE_ENABLED_MASK)
> + return _xgetbv(_XCR_XFEATURE_ENABLED_MASK);
> +#else
> + return 0;
> +#endif
> +
> +}
> #endif /* X86 or X86_64 */
>
> void
> @@ -284,7 +305,9 @@ util_cpu_detect(void)
> util_cpu_caps.has_sse4_1 = (regs2[2] >> 19) & 1;
> util_cpu_caps.has_sse4_2 = (regs2[2] >> 20) & 1;
> util_cpu_caps.has_popcnt = (regs2[2] >> 23) & 1;
> - util_cpu_caps.has_avx = (regs2[2] >> 28) & 1;
> + util_cpu_caps.has_avx = ((regs2[2] >> 28) & 1) && // AVX
> + ((regs2[2] >> 27) & 1) && // OSXSAVE
> + ((xgetbv() & 6) == 6); // XMM & YMM
> util_cpu_caps.has_f16c = (regs2[2] >> 29) & 1;
> util_cpu_caps.has_mmx2 = util_cpu_caps.has_sse; /* SSE cpus supports mmxext too */
>
>
Looks good to me though it's a pity detection depends on compiler.
Granted it looks like icc currently won't work but still...
I guess that technically the test for sse(x) isn't correct neither as
that too requires OS support, I don't know off-hand though how to check
for it (and we'd be talking ANCIENT os here...).
Roland
More information about the mesa-dev
mailing list