[Mesa-dev] [PATCH 5/7] util: Use SSE intrinsics in _mesa_lroundeven{f, }.

Roland Scheidegger sroland at vmware.com
Fri Jul 31 18:37:20 PDT 2015


Am 01.08.2015 um 03:02 schrieb Matt Turner:
> On Fri, Jul 31, 2015 at 5:50 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>> Am 01.08.2015 um 01:26 schrieb Matt Turner:
>>> gcc actually generates this for us now that we use -fno-math-errno
>>> (which is weird, since lrintf()/lrint() don't set errno) but clang still
>>> does not. Presumably helps MSVC as well.
>>>
>>> Reduced .text size by 8.5k with gcc before -fno-math-errno.
>>>
>>>    text     data      bss      dec      hex  filename
>>> 4935850   195136    26192  5157178   4eb13a  i965_dri.so before
>>> 4927225   195128    26192  5148545   4e8f81  i965_dri.so after
>>> ---
>>>  src/util/rounding.h | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/src/util/rounding.h b/src/util/rounding.h
>>> index 2d00760..e546c9f 100644
>>> --- a/src/util/rounding.h
>>> +++ b/src/util/rounding.h
>>> @@ -26,6 +26,11 @@
>>>
>>>  #include <math.h>
>>>
>>> +#ifdef __x86_64__
>>> +#include <xmmintrin.h>
>>> +#include <emmintrin.h>
>>> +#endif
>>> +
>>>  #ifdef __SSE4_1__
>>>  #include <smmintrin.h>
>>>  #endif
>>> @@ -87,7 +92,11 @@ _mesa_roundeven(double x)
>>>  static inline long
>>>  _mesa_lroundevenf(float x)
>>>  {
>>> +#ifdef __x86_64__
>>> +   return _mm_cvtss_si64(_mm_load_ss(&x));
>> I think you really want _mm_cvtss_si32, not 64. Longs tend to be 32bit.
>> _mm_cvtss_si64 would be the equivalent of llrintf.
> 
> long is 64-bits on Linux/amd64. Looks like it's 32-bits on x32 and
> Windows though.
You are of course totally right.
The actual assembly looks pretty much the same of course (cvtss2si
%xmm0,%rax is the 64bit version...).
Another solution would be to make this function return an int, as all
callers (so far) expect ints anyway. (Those may get different results
now for overflows (both negative and positive), as it's the lower 32bits
now, whereas before this F_TO_I and IROUND actually produced the integer
indefinite value at least with sse (0x80000000) - while undefined result
certainly includes random numbers, it may make figuring out some bugs
slightly harder due to the random numbers).
But either way looks ok to me, can't say I like that datatype though, I
like datatypes with known sizes and not those with surprising
differences :-).

> 
> I guess I need to do
> 
> #ifdef __x86_64__
> #if LONG_BIT == 64
>    return _mm_cvtss_si64(_mm_load_ss(&x));
> #elif LONG_BIT == 32
>    return _mm_cvtss_si32(_mm_load_ss(&x));
> #endif
> #endif
> 
> I'll change it to that.
Would x32 actually have __x86_64__ set?

Roland


> 
> Thanks!
> 



More information about the mesa-dev mailing list