[Mesa-dev] [PATCH 5/7] util: Use SSE intrinsics in _mesa_lroundeven{f, }.

Mon Aug 3 09:39:00 PDT 2015

On Fri, Jul 31, 2015 at 6:37 PM, Roland Scheidegger <sroland at vmware.com> wrote:
> Am 01.08.2015 um 03:02 schrieb Matt Turner:
>> On Fri, Jul 31, 2015 at 5:50 PM, Roland Scheidegger <sroland at vmware.com> wrote:
>>> Am 01.08.2015 um 01:26 schrieb Matt Turner:
>>>> gcc actually generates this for us now that we use -fno-math-errno
>>>> (which is weird, since lrintf()/lrint() don't set errno) but clang still
>>>> does not. Presumably helps MSVC as well.
>>>>
>>>> Reduced .text size by 8.5k with gcc before -fno-math-errno.
>>>>
>>>>    text     data      bss      dec      hex  filename
>>>> 4935850   195136    26192  5157178   4eb13a  i965_dri.so before
>>>> 4927225   195128    26192  5148545   4e8f81  i965_dri.so after
>>>> ---
>>>>  src/util/rounding.h | 13 +++++++++++++
>>>>  1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/src/util/rounding.h b/src/util/rounding.h
>>>> index 2d00760..e546c9f 100644
>>>> --- a/src/util/rounding.h
>>>> +++ b/src/util/rounding.h
>>>> @@ -26,6 +26,11 @@
>>>>
>>>>  #include <math.h>
>>>>
>>>> +#ifdef __x86_64__
>>>> +#include <xmmintrin.h>
>>>> +#include <emmintrin.h>
>>>> +#endif
>>>> +
>>>>  #ifdef __SSE4_1__
>>>>  #include <smmintrin.h>
>>>>  #endif
>>>> @@ -87,7 +92,11 @@ _mesa_roundeven(double x)
>>>>  static inline long
>>>>  _mesa_lroundevenf(float x)
>>>>  {
>>>> +#ifdef __x86_64__
>>>> +   return _mm_cvtss_si64(_mm_load_ss(&x));
>>> I think you really want _mm_cvtss_si32, not 64. Longs tend to be 32bit.
>>> _mm_cvtss_si64 would be the equivalent of llrintf.
>>
>> long is 64-bits on Linux/amd64. Looks like it's 32-bits on x32 and
>> Windows though.
> You are of course totally right.
> The actual assembly looks pretty much the same of course (cvtss2si
> %xmm0,%rax is the 64bit version...).
> Another solution would be to make this function return an int, as all
> callers (so far) expect ints anyway. (Those may get different results
> now for overflows (both negative and positive), as it's the lower 32bits
> now, whereas before this F_TO_I and IROUND actually produced the integer
> indefinite value at least with sse (0x80000000) - while undefined result
> certainly includes random numbers, it may make figuring out some bugs
> slightly harder due to the random numbers).
> But either way looks ok to me, can't say I like that datatype though, I
> like datatypes with known sizes and not those with surprising
> differences :-).

Yeah, I don't like "long" either. I haven't come up with a reason why
the float->int libc routines return it. It's (a) not always big
enough, and (b) the "long long" routines are often exactly the same.

I really just wanted to wrap lrintf and friends and then to match the
behavior with SSE intrinsics. I guess we could do that and always
return int as well...

>
>>
>> I guess I need to do
>>
>> #ifdef __x86_64__
>> #if LONG_BIT == 64
>>    return _mm_cvtss_si64(_mm_load_ss(&x));
>> #elif LONG_BIT == 32
>>    return _mm_cvtss_si32(_mm_load_ss(&x));
>> #endif
>> #endif
>>
>> I'll change it to that.
> Would x32 actually have __x86_64__ set?

Yes, much to the displeasure of many people.