[Mesa-dev] [PATCH 3/3] util: Use SSE rounding on all platforms that support it.

Sun Aug 9 10:23:18 PDT 2015

Am 09.08.2015 um 18:47 schrieb Matt Turner:
> On Sun, Aug 9, 2015 at 3:57 AM, Jose Fonseca <jfonseca at vmware.com> wrote:
>> As currently only GCC x86_64 builds where using it.
>> ---
>>  src/util/rounding.h | 16 +++++++++++++---
>>  1 file changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/src/util/rounding.h b/src/util/rounding.h
>> index ec31b47..38c1c2f 100644
>> --- a/src/util/rounding.h
>> +++ b/src/util/rounding.h
>> @@ -27,7 +27,17 @@
>>  #include <math.h>
>>  #include <limits.h>
>>
>> -#ifdef __x86_64__
>> +/* SSE2 is supported on: all x86_64 targets, on x86 targets when -msse2 is
>> + * passed to GCC, and should also be enabled on all Windows builds. */
>> +#if defined(__x86_64__) /* gcc */ || \
>> +    defined(_M_X64) /* msvc */ || \
>> +    defined(_M_AMD64) /* msvc */ || \
>> +    defined(__SSE2__) /* gcc -msse2 */ || \
> 
> I don't think we should include __SSE2__ in this. On x86-32,
> floating-point operations will be using the x87 FPU, so using SSE
> intrinsics will force some transfers to and from memory.
My understanding is, you will only get __SSE2__ if you actually used
-msse2 to build, otherwise this will be not defined. And in this case
gcc will use sse/sse2 for all float math, thus this is quite appropriate.

As for the _WIN32 I think we always enable sse math for windows.
scons gallium.py defines /arch:SSE2 for windows msvc builds always (and
mentions it's the default for msvc 2012 anyway), again meaning these
builds will not touch x87 fpu (and not run on your 80486...).
Actually reading some MS docs, the compiler may well chose to run a mix
of x87 and sse/sse2 but in this case there'd be no way to know if the
sse conversion instruction should be used anyway.
Don't ask me about other compilers on windows, though...
I guess though this means that you could not make mesa run on such chips
if you wanted by just changing compile flags, but I don't know if that's
really a problem. (I think the problem here is that if you'd use
something like /arch:AVX then indeed __AVX__ would get defined by msvc,
however for /arch:SSE2 no __SSE2__ macro (or any other) would get
defined (boo...). Though I guess it could be defined manually somewhere
for such builds somewhere depending on the /arch flag used for compiling
in the build infrastructure.)

Roland

> 
>> +    defined(_WIN32)
>> +#define HAVE_SSE2 1
> 
> Does MSVC define __amd64, __amd64__, or __x86_64? The AMD64 ABI
> document says these, and __x86_64__ should be defined if compiling on
> x86-64. With the removal of __SSE2__ above, I'd make this
> 
> #ifndef __x86_64__
> #if defined(_M_X64) /* msvc */ || \
>     defined(_M_AMD64) /* msvc */ || \
>     defined(_WIN32)
> #define __x86_64__
> #endif
> #endif
> 
>> +#endif
>> +
>> +#ifdef HAVE_SSE2
>>  #include <xmmintrin.h>
>>  #include <emmintrin.h>
>>  #endif
>> @@ -93,7 +103,7 @@ _mesa_roundeven(double x)
>>  static inline long
>>  _mesa_lroundevenf(float x)
>>  {
>> -#ifdef __x86_64__
>> +#ifdef HAVE_SSE2
>>  #if LONG_BIT == 64
>>     return _mm_cvtss_si64(_mm_load_ss(&x));
>>  #elif LONG_BIT == 32 || defined(_WIN32)
>> @@ -113,7 +123,7 @@ _mesa_lroundevenf(float x)
>>  static inline long
>>  _mesa_lroundeven(double x)
>>  {
>> -#ifdef __x86_64__
>> +#ifdef HAVE_SSE2
>>  #if LONG_BIT == 64
>>     return _mm_cvtsd_si64(_mm_load_sd(&x));
>>  #elif LONG_BIT == 32 || defined(_WIN32)
>> --
>> 2.1.4
>>