[Mesa-dev] [PATCH 3/3] util: Use SSE rounding on all platforms that support it.

Sun Aug 9 11:00:37 PDT 2015

Forgot to mention, strictly speaking only __SSE__ is necessary for
_mesa_lroundevenf, so it would work on these shiny Pentium 3 and Athlon
XP... The double version (though it's unused) however requires __SSE2__.

Roland

Am 09.08.2015 um 19:23 schrieb Roland Scheidegger:
> Am 09.08.2015 um 18:47 schrieb Matt Turner:
>> On Sun, Aug 9, 2015 at 3:57 AM, Jose Fonseca <jfonseca at vmware.com> wrote:
>>> As currently only GCC x86_64 builds where using it.
>>> ---
>>>  src/util/rounding.h | 16 +++++++++++++---
>>>  1 file changed, 13 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/util/rounding.h b/src/util/rounding.h
>>> index ec31b47..38c1c2f 100644
>>> --- a/src/util/rounding.h
>>> +++ b/src/util/rounding.h
>>> @@ -27,7 +27,17 @@
>>>  #include <math.h>
>>>  #include <limits.h>
>>>
>>> -#ifdef __x86_64__
>>> +/* SSE2 is supported on: all x86_64 targets, on x86 targets when -msse2 is
>>> + * passed to GCC, and should also be enabled on all Windows builds. */
>>> +#if defined(__x86_64__) /* gcc */ || \
>>> +    defined(_M_X64) /* msvc */ || \
>>> +    defined(_M_AMD64) /* msvc */ || \
>>> +    defined(__SSE2__) /* gcc -msse2 */ || \
>>
>> I don't think we should include __SSE2__ in this. On x86-32,
>> floating-point operations will be using the x87 FPU, so using SSE
>> intrinsics will force some transfers to and from memory.
> My understanding is, you will only get __SSE2__ if you actually used
> -msse2 to build, otherwise this will be not defined. And in this case
> gcc will use sse/sse2 for all float math, thus this is quite appropriate.
> 
> As for the _WIN32 I think we always enable sse math for windows.
> scons gallium.py defines /arch:SSE2 for windows msvc builds always (and
> mentions it's the default for msvc 2012 anyway), again meaning these
> builds will not touch x87 fpu (and not run on your 80486...).
> Actually reading some MS docs, the compiler may well chose to run a mix
> of x87 and sse/sse2 but in this case there'd be no way to know if the
> sse conversion instruction should be used anyway.
> Don't ask me about other compilers on windows, though...
> I guess though this means that you could not make mesa run on such chips
> if you wanted by just changing compile flags, but I don't know if that's
> really a problem. (I think the problem here is that if you'd use
> something like /arch:AVX then indeed __AVX__ would get defined by msvc,
> however for /arch:SSE2 no __SSE2__ macro (or any other) would get
> defined (boo...). Though I guess it could be defined manually somewhere
> for such builds somewhere depending on the /arch flag used for compiling
> in the build infrastructure.)
> 
> Roland
> 
> 
>>
>>> +    defined(_WIN32)
>>> +#define HAVE_SSE2 1
>>
>> Does MSVC define __amd64, __amd64__, or __x86_64? The AMD64 ABI
>> document says these, and __x86_64__ should be defined if compiling on
>> x86-64. With the removal of __SSE2__ above, I'd make this
>>
>> #ifndef __x86_64__
>> #if defined(_M_X64) /* msvc */ || \
>>     defined(_M_AMD64) /* msvc */ || \
>>     defined(_WIN32)
>> #define __x86_64__
>> #endif
>> #endif
>>
>>> +#endif
>>> +
>>> +#ifdef HAVE_SSE2
>>>  #include <xmmintrin.h>
>>>  #include <emmintrin.h>
>>>  #endif
>>> @@ -93,7 +103,7 @@ _mesa_roundeven(double x)
>>>  static inline long
>>>  _mesa_lroundevenf(float x)
>>>  {
>>> -#ifdef __x86_64__
>>> +#ifdef HAVE_SSE2
>>>  #if LONG_BIT == 64
>>>     return _mm_cvtss_si64(_mm_load_ss(&x));
>>>  #elif LONG_BIT == 32 || defined(_WIN32)
>>> @@ -113,7 +123,7 @@ _mesa_lroundevenf(float x)
>>>  static inline long
>>>  _mesa_lroundeven(double x)
>>>  {
>>> -#ifdef __x86_64__
>>> +#ifdef HAVE_SSE2
>>>  #if LONG_BIT == 64
>>>     return _mm_cvtsd_si64(_mm_load_sd(&x));
>>>  #elif LONG_BIT == 32 || defined(_WIN32)
>>> --
>>> 2.1.4
>>>
>