[Mesa-dev] [PATCH] gallium/util: Fix detection of AVX cpu caps

Andre Heider a.heider at gmail.com
Wed Jul 24 01:43:51 PDT 2013


On Wed, Jul 24, 2013 at 12:11 AM, Jose Fonseca <jfonseca at vmware.com> wrote:
>
>
> ----- Original Message -----
>> On Tue, Jul 23, 2013 at 8:57 PM, Roland Scheidegger <sroland at vmware.com>
>> wrote:
>> > Am 23.07.2013 19:08, schrieb Andre Heider:
>> >> For AVX it's not sufficient to only rely on the cpuid flags. If the CPU
>> >> supports these extensions, but the OS doesn't, issuing these insns will
>> >> trigger an undefined opcode exception.
>> >>
>> >> In addition to the AVX cpuid bit we also need to:
>> >> * test cpuid for OSXSAVE support
>> >> * XGETBV to check if the OS saves/restores AVX regs on context switches
>> >>
>> >> See "Detecting Availability and Support" at
>> >> http://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions
>> >>
>> >> Signed-off-by: Andre Heider <a.heider at gmail.com>
>> >> ---
>> >>  src/gallium/auxiliary/util/u_cpu_detect.c | 27
>> >>  +++++++++++++++++++++++++--
>> >>  1 file changed, 25 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/src/gallium/auxiliary/util/u_cpu_detect.c
>> >> b/src/gallium/auxiliary/util/u_cpu_detect.c
>> >> index b118fc8..588fc7c 100644
>> >> --- a/src/gallium/auxiliary/util/u_cpu_detect.c
>> >> +++ b/src/gallium/auxiliary/util/u_cpu_detect.c
>> >> @@ -67,7 +67,7 @@
>> >>
>> >>  #if defined(PIPE_OS_WINDOWS)
>> >>  #include <windows.h>
>> >> -#if defined(MSVC)
>> >> +#if defined(PIPE_CC_MSVC)
>> >>  #include <intrin.h>
>> >>  #endif
>> >>  #endif
>> >> @@ -211,6 +211,27 @@ cpuid(uint32_t ax, uint32_t *p)
>> >>     p[3] = 0;
>> >>  #endif
>> >>  }
>> >> +
>> >> +static INLINE uint64_t xgetbv(void)
>> >> +{
>> >> +#if defined(PIPE_CC_GCC)
>> >> +   uint32_t eax, edx;
>> >> +
>> >> +   __asm __volatile (
>> >> +     ".byte 0x0f, 0x01, 0xd0" // xgetbv isn't supported on gcc < 4.4
>> >> +     : "=a"(eax),
>> >> +       "=d"(edx)
>> >> +     : "c"(0)
>> >> +   );
>> >> +
>> >> +   return ((uint64_t)edx << 32) | eax;
>> >> +#elif defined(PIPE_CC_MSVC) && defined(_MSC_FULL_VER) &&
>> >> defined(_XCR_XFEATURE_ENABLED_MASK)
>> >> +   return _xgetbv(_XCR_XFEATURE_ENABLED_MASK);
>> >> +#else
>> >> +   return 0;
>> >> +#endif
>> >> +
>> >> +}
>> >>  #endif /* X86 or X86_64 */
>> >>
>> >>  void
>> >> @@ -284,7 +305,9 @@ util_cpu_detect(void)
>> >>           util_cpu_caps.has_sse4_1 = (regs2[2] >> 19) & 1;
>> >>           util_cpu_caps.has_sse4_2 = (regs2[2] >> 20) & 1;
>> >>           util_cpu_caps.has_popcnt = (regs2[2] >> 23) & 1;
>> >> -         util_cpu_caps.has_avx    = (regs2[2] >> 28) & 1;
>> >> +         util_cpu_caps.has_avx    = ((regs2[2] >> 28) & 1) && // AVX
>> >> +                                    ((regs2[2] >> 27) & 1) && // OSXSAVE
>> >> +                                    ((xgetbv() & 6) == 6);    // XMM &
>> >> YMM
>> >>           util_cpu_caps.has_f16c   = (regs2[2] >> 29) & 1;
>> >>           util_cpu_caps.has_mmx2   = util_cpu_caps.has_sse; /* SSE cpus
>> >>           supports mmxext too */
>> >>
>> >>
>> >
>> > Looks good to me though
>
> Looks good to me too. Thanks.
>
>> > it's a pity detection depends on compiler.
>> > Granted it looks like icc currently won't work but still...
>> > I guess that technically the test for sse(x) isn't correct neither as
>> > that too requires OS support, I don't know off-hand though how to check
>> > for it (and we'd be talking ANCIENT os here...).
>>
>> Ancient indeed ;)
>>
>> But with AVX the problem becomes more urgent: All SSE versions used
>> the same registers, AVX extended those.
>> Now we recently got a AVX enabled vSphere server, and exposing that to
>> XP guests doesn't go too well with llvmpipe without this patch.
>
> I don't know of many llvmpipe windows users, specially XP.   If it's not confidential, how are you using it?

I think the situation can be compared to GNOME Shell/Chrome/Qt5: Its
used for remote sessions and serves as a fallback in case the native
GL driver sucks too much.
According to our render guy: While there's a MS software renderer, it
only implements GL 1.1. No multi texturing, npot restrictions,
additional code paths, testing those paths...  llvmpipe makes life
easier there.

Regards,
Andre


More information about the mesa-dev mailing list