[Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf

Roland Scheidegger sroland at vmware.com
Wed Nov 14 02:26:34 UTC 2018


Am 14.11.18 um 03:02 schrieb Roland Scheidegger:
> Am 13.11.18 um 23:49 schrieb Dylan Baker:
>> Quoting Roland Scheidegger (2018-11-13 14:13:00)
>>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
>>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
>>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
>>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
>>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
>>>>>>>> Which has the same behavior.
>>>>>>>
>>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
>>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
>>>>>>> to
>>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
>>>>>>>
>>>>>>> I'm not sure if it matters or not, but *at least* point that out in
>>>>>>> the
>>>>>>> commit message. Unless I'm missing something, of course...
>>>>>>
>>>>>> I should put it in the commit message, but there is a comment in
>>>>>> rounding.h that
>>>>>> if you change the rounding mode you get to keep the pieces.
>>>>>
>>>>> Well, this might regress performance pretty badly. Especially in the
>>>>> swrast code, this could be bad...
>>>>>
>>>>
>>>> Why? we have the assumption that you don't change the rounding mode already in
>>>> core mesa and many of the drivers.
>>>>
>>>> For performance, I measured a simple 1000 loops of rounding, and found that the
>>>> only way the rounding.h function was slower is if you used the __SSE4_1__
>>>> path... (It was the same performance as the int cast +0.5 implementation)
>>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
>>> since all sse2 capable cpus can natively do rintf. Although maybe it
>>> should be pointed out that the sse41 implementation will use a defined
>>> rounding mode, whereas rintf will use current rounding mode. But I don't
>>> think anyone ever cares for the results if a different rounding mode
>>> would be set. Although of course rint and its variant do not actually
>>> guarantee the even part of it (but well if it's a sse41 capable box we
>>> pretty much know it would do just that anyway)... (And technically
>>> nearbyintf would probably be an even better solution, since we never
>>> want to get involved with the clunky exceptions, otherwise it's
>>> identical. But there might be reasons why it isn't used.)
>>>
>>> Roland
>>
>> I'm not convinced we want it either, since it seems to be slower than glibc's
>> rintf. I guess it probably does make sense to use the nearbyintf instead.
>>
>> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION not
>> check the rounding mode?
> Oh indeed, I didn't check the code too closely (I was just assuming
> _mm_round_ss() was used because it is possible to use round-to-nearest
> regardless the actual rounding mode, but that's not the case).
> 
> But actually I misread this code: the point of mesa_roundevenf is to
> round to float WITHOUT conversion to int. In which case it makes more
> sense at least at first look...
> 
> But if you want to round to nearest integer WITH conversion to int, you
> probably really want to use something else. nearbyint family doesn't
> have variants which give you ints. There's rint functions which give you
> ints directly, but they are likely a very bad idea (aside from exception
> handling, not quite sure if this really causes the compiler to do
> something different) because of giving you long (or long long) results -
> meaning that you can't use the simple cpu instructions giving you 32bit
> results (because conversion to 64bit long + trunc to 32bit will give you
> defined (although meaningless) results in some cases where direct
> conversion to 32bit int wouldn't).
> So ideally you'd pick a variant where the compiler is smart enough to
> recognize it can be done with a single instruction. I would guess
> nearbyintf + int cast should do just about everywhere, at least as long
> as x64 or x86 + sse2 is used, my suspicion is the old IROUND function
> was done in a time where x87 was still relevant. Or maybe rintf + int
> cast, no idea how the compiler really handles them differently (I tried
> to quickly look at it in gcc source, but no idea where those are
> buried). As a side note, I hate it when the assembly solution is obvious
> and you can't really figure out how the hell you should coax the
> compiler in giving you the right answer (I mean, high level languages
> are there to help, not get in your way...).
> 
> All that said, I still don't really see the point of the manual sse41
> assembly (even for the case when we don't want to convert to int) -
> assuming there is an easy solution to get the compiler to do the right
> thing...

Err, I tried it out and was completely unable to come up with something
which wouldn't generate huge amounts of crap code (or library calls).
WTF. (But might depend on compiler, of course.)
So I guess maybe for round conversion to int you actually want to
manually do sse2 inline asm (cvtss2si). Although it's crazy.

Roland





More information about the mesa-dev mailing list