[Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

Wed Nov 14 02:03:01 UTC 2018

Am 13.11.18 um 23:49 schrieb Dylan Baker:
> Quoting Roland Scheidegger (2018-11-13 14:13:00)
>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
>>>>>>> Which has the same behavior.
>>>>>>
>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
>>>>>> to
>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
>>>>>>
>>>>>> I'm not sure if it matters or not, but *at least* point that out in
>>>>>> the
>>>>>> commit message. Unless I'm missing something, of course...
>>>>>
>>>>> I should put it in the commit message, but there is a comment in
>>>>> rounding.h that
>>>>> if you change the rounding mode you get to keep the pieces.
>>>>
>>>> Well, this might regress performance pretty badly. Especially in the
>>>> swrast code, this could be bad...
>>>>
>>>
>>> Why? we have the assumption that you don't change the rounding mode already in
>>> core mesa and many of the drivers.
>>>
>>> For performance, I measured a simple 1000 loops of rounding, and found that the
>>> only way the rounding.h function was slower is if you used the __SSE4_1__
>>> path... (It was the same performance as the int cast +0.5 implementation)
>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
>> since all sse2 capable cpus can natively do rintf. Although maybe it
>> should be pointed out that the sse41 implementation will use a defined
>> rounding mode, whereas rintf will use current rounding mode. But I don't
>> think anyone ever cares for the results if a different rounding mode
>> would be set. Although of course rint and its variant do not actually
>> guarantee the even part of it (but well if it's a sse41 capable box we
>> pretty much know it would do just that anyway)... (And technically
>> nearbyintf would probably be an even better solution, since we never
>> want to get involved with the clunky exceptions, otherwise it's
>> identical. But there might be reasons why it isn't used.)
>>
>> Roland
> 
> I'm not convinced we want it either, since it seems to be slower than glibc's
> rintf. I guess it probably does make sense to use the nearbyintf instead.
> 
> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION not
> check the rounding mode?
Oh indeed, I didn't check the code too closely (I was just assuming
_mm_round_ss() was used because it is possible to use round-to-nearest
regardless the actual rounding mode, but that's not the case).

But actually I misread this code: the point of mesa_roundevenf is to
round to float WITHOUT conversion to int. In which case it makes more
sense at least at first look...

But if you want to round to nearest integer WITH conversion to int, you
probably really want to use something else. nearbyint family doesn't
have variants which give you ints. There's rint functions which give you
ints directly, but they are likely a very bad idea (aside from exception
handling, not quite sure if this really causes the compiler to do
something different) because of giving you long (or long long) results -
meaning that you can't use the simple cpu instructions giving you 32bit
results (because conversion to 64bit long + trunc to 32bit will give you
defined (although meaningless) results in some cases where direct
conversion to 32bit int wouldn't).
So ideally you'd pick a variant where the compiler is smart enough to
recognize it can be done with a single instruction. I would guess
nearbyintf + int cast should do just about everywhere, at least as long
as x64 or x86 + sse2 is used, my suspicion is the old IROUND function
was done in a time where x87 was still relevant. Or maybe rintf + int
cast, no idea how the compiler really handles them differently (I tried
to quickly look at it in gcc source, but no idea where those are
buried). As a side note, I hate it when the assembly solution is obvious
and you can't really figure out how the hell you should coax the
compiler in giving you the right answer (I mean, high level languages
are there to help, not get in your way...).

All that said, I still don't really see the point of the manual sse41
assembly (even for the case when we don't want to convert to int) -
assuming there is an easy solution to get the compiler to do the right
thing...

Roland

> 
> Dylan
>