[Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

Wed Nov 14 02:41:00 UTC 2018

Am 14.11.18 um 03:21 schrieb Matt Turner:
> On Tue, Nov 13, 2018 at 6:03 PM Roland Scheidegger <sroland at vmware.com> wrote:
>>
>> Am 13.11.18 um 23:49 schrieb Dylan Baker:
>>> Quoting Roland Scheidegger (2018-11-13 14:13:00)
>>>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
>>>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
>>>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
>>>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
>>>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
>>>>>>>>> Which has the same behavior.
>>>>>>>>
>>>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
>>>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
>>>>>>>> to
>>>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
>>>>>>>>
>>>>>>>> I'm not sure if it matters or not, but *at least* point that out in
>>>>>>>> the
>>>>>>>> commit message. Unless I'm missing something, of course...
>>>>>>>
>>>>>>> I should put it in the commit message, but there is a comment in
>>>>>>> rounding.h that
>>>>>>> if you change the rounding mode you get to keep the pieces.
>>>>>>
>>>>>> Well, this might regress performance pretty badly. Especially in the
>>>>>> swrast code, this could be bad...
>>>>>>
>>>>>
>>>>> Why? we have the assumption that you don't change the rounding mode already in
>>>>> core mesa and many of the drivers.
>>>>>
>>>>> For performance, I measured a simple 1000 loops of rounding, and found that the
>>>>> only way the rounding.h function was slower is if you used the __SSE4_1__
>>>>> path... (It was the same performance as the int cast +0.5 implementation)
>>>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
>>>> since all sse2 capable cpus can natively do rintf. Although maybe it
>>>> should be pointed out that the sse41 implementation will use a defined
>>>> rounding mode, whereas rintf will use current rounding mode. But I don't
>>>> think anyone ever cares for the results if a different rounding mode
>>>> would be set. Although of course rint and its variant do not actually
>>>> guarantee the even part of it (but well if it's a sse41 capable box we
>>>> pretty much know it would do just that anyway)... (And technically
>>>> nearbyintf would probably be an even better solution, since we never
>>>> want to get involved with the clunky exceptions, otherwise it's
>>>> identical. But there might be reasons why it isn't used.)
>>>>
>>>> Roland
>>>
>>> I'm not convinced we want it either, since it seems to be slower than glibc's
>>> rintf. I guess it probably does make sense to use the nearbyintf instead.
>>>
>>> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION not
>>> check the rounding mode?
>> Oh indeed, I didn't check the code too closely (I was just assuming
>> _mm_round_ss() was used because it is possible to use round-to-nearest
>> regardless the actual rounding mode, but that's not the case).
>>
>> But actually I misread this code: the point of mesa_roundevenf is to
>> round to float WITHOUT conversion to int. In which case it makes more
>> sense at least at first look...
>>
>> But if you want to round to nearest integer WITH conversion to int, you
>> probably really want to use something else. nearbyint family doesn't
>> have variants which give you ints. There's rint functions which give you
>> ints directly, but they are likely a very bad idea (aside from exception
> 
> Why?
Not sure what the why refers to here?

> 
>> handling, not quite sure if this really causes the compiler to do
>> something different) because of giving you long (or long long) results -
>> meaning that you can't use the simple cpu instructions giving you 32bit
>> results (because conversion to 64bit long + trunc to 32bit will give you
>> defined (although meaningless) results in some cases where direct
>> conversion to 32bit int wouldn't).
>> So ideally you'd pick a variant where the compiler is smart enough to
>> recognize it can be done with a single instruction. I would guess
>> nearbyintf + int cast should do just about everywhere, at least as long
>> as x64 or x86 + sse2 is used, my suspicion is the old IROUND function
>> was done in a time where x87 was still relevant. Or maybe rintf + int
>> cast, no idea how the compiler really handles them differently (I tried
>> to quickly look at it in gcc source, but no idea where those are
>> buried). As a side note, I hate it when the assembly solution is obvious
>> and you can't really figure out how the hell you should coax the
>> compiler in giving you the right answer (I mean, high level languages
>> are there to help, not get in your way...).
> 
> Please read the commit message of
> 
> commit dd0d3a2c0fb388745519c8a3be800720541eccfe
> Author: Matt Turner <mattst88 at gmail.com>
> Date:   Tue Mar 10 17:55:21 2015 -0700
> 
>     mesa: Replace _mesa_round_to_even() with _mesa_roundeven().
> 
> for a lot of the background.
> 
> I expect IROUND_POS can be replaced with the _mesa_lroundevenf function.
> 

Yes, I missed that function. _mesa_lroundevenf is exactly what we want
(although if someone cares about other archs, lrintf might generate
pretty bad code there too).

The commit message though isn't quite right there, saying "rint() and
nearbyint() implement the round-half-to-even behavior we want when the
rounding mode is set to the default round-to-nearest. The only
difference between them is that nearbyint() raises the inexact
exception."
And seemingly uses that to justify using rintf and not nearbyintf for
the fallback. It's the opposite, nearbyint never raises inexact
exception, whereas rint does (depending on fp environment). Or maybe not
(I suppose nearbyint could be more work because the mxcsr bits might
have to be set explicitly to NOT cause an exception).

Roland