[Mesa-dev] [PATCH 16/28] Replace IROUND_POS with _mesa_roundevenf
Matt Turner
mattst88 at gmail.com
Fri Nov 16 19:04:52 UTC 2018
On Tue, Nov 13, 2018 at 6:41 PM Roland Scheidegger <sroland at vmware.com> wrote:
>
> Am 14.11.18 um 03:21 schrieb Matt Turner:
> > On Tue, Nov 13, 2018 at 6:03 PM Roland Scheidegger <sroland at vmware.com> wrote:
> >>
> >> Am 13.11.18 um 23:49 schrieb Dylan Baker:
> >>> Quoting Roland Scheidegger (2018-11-13 14:13:00)
> >>>> Am 13.11.18 um 18:00 schrieb Dylan Baker:
> >>>>> Quoting Erik Faye-Lund (2018-11-13 01:34:53)
> >>>>>> On Mon, 2018-11-12 at 09:22 -0800, Dylan Baker wrote:
> >>>>>>> Quoting Erik Faye-Lund (2018-11-12 04:51:47)
> >>>>>>>> On Fri, 2018-11-09 at 10:40 -0800, Dylan Baker wrote:
> >>>>>>>>> Which has the same behavior.
> >>>>>>>>
> >>>>>>>> Does it? I'm not so sure... IROUND_POS seems to round to nearest
> >>>>>>>> integer depending on the FPU rounding mode, _mesa_roundevenf rounds
> >>>>>>>> to
> >>>>>>>> the nearest *even* value regardless of the FPU rounding mode, no?
> >>>>>>>>
> >>>>>>>> I'm not sure if it matters or not, but *at least* point that out in
> >>>>>>>> the
> >>>>>>>> commit message. Unless I'm missing something, of course...
> >>>>>>>
> >>>>>>> I should put it in the commit message, but there is a comment in
> >>>>>>> rounding.h that
> >>>>>>> if you change the rounding mode you get to keep the pieces.
> >>>>>>
> >>>>>> Well, this might regress performance pretty badly. Especially in the
> >>>>>> swrast code, this could be bad...
> >>>>>>
> >>>>>
> >>>>> Why? we have the assumption that you don't change the rounding mode already in
> >>>>> core mesa and many of the drivers.
> >>>>>
> >>>>> For performance, I measured a simple 1000 loops of rounding, and found that the
> >>>>> only way the rounding.h function was slower is if you used the __SSE4_1__
> >>>>> path... (It was the same performance as the int cast +0.5 implementation)
> >>>> FWIW I'm not entirely sure it's useful to have a sse41 implementation -
> >>>> since all sse2 capable cpus can natively do rintf. Although maybe it
> >>>> should be pointed out that the sse41 implementation will use a defined
> >>>> rounding mode, whereas rintf will use current rounding mode. But I don't
> >>>> think anyone ever cares for the results if a different rounding mode
> >>>> would be set. Although of course rint and its variant do not actually
> >>>> guarantee the even part of it (but well if it's a sse41 capable box we
> >>>> pretty much know it would do just that anyway)... (And technically
> >>>> nearbyintf would probably be an even better solution, since we never
> >>>> want to get involved with the clunky exceptions, otherwise it's
> >>>> identical. But there might be reasons why it isn't used.)
> >>>>
> >>>> Roland
> >>>
> >>> I'm not convinced we want it either, since it seems to be slower than glibc's
> >>> rintf. I guess it probably does make sense to use the nearbyintf instead.
> >>>
> >>> As an aside (since I know 0 about assembly), does _MM_FROUND_CUR_DIRECTION not
> >>> check the rounding mode?
> >> Oh indeed, I didn't check the code too closely (I was just assuming
> >> _mm_round_ss() was used because it is possible to use round-to-nearest
> >> regardless the actual rounding mode, but that's not the case).
> >>
> >> But actually I misread this code: the point of mesa_roundevenf is to
> >> round to float WITHOUT conversion to int. In which case it makes more
> >> sense at least at first look...
> >>
> >> But if you want to round to nearest integer WITH conversion to int, you
> >> probably really want to use something else. nearbyint family doesn't
> >> have variants which give you ints. There's rint functions which give you
> >> ints directly, but they are likely a very bad idea (aside from exception
> >
> > Why?
> Not sure what the why refers to here?
Why is rint a very bad idea?
> >
> >> handling, not quite sure if this really causes the compiler to do
> >> something different) because of giving you long (or long long) results -
> >> meaning that you can't use the simple cpu instructions giving you 32bit
> >> results (because conversion to 64bit long + trunc to 32bit will give you
> >> defined (although meaningless) results in some cases where direct
> >> conversion to 32bit int wouldn't).
> >> So ideally you'd pick a variant where the compiler is smart enough to
> >> recognize it can be done with a single instruction. I would guess
> >> nearbyintf + int cast should do just about everywhere, at least as long
> >> as x64 or x86 + sse2 is used, my suspicion is the old IROUND function
> >> was done in a time where x87 was still relevant. Or maybe rintf + int
> >> cast, no idea how the compiler really handles them differently (I tried
> >> to quickly look at it in gcc source, but no idea where those are
> >> buried). As a side note, I hate it when the assembly solution is obvious
> >> and you can't really figure out how the hell you should coax the
> >> compiler in giving you the right answer (I mean, high level languages
> >> are there to help, not get in your way...).
> >
> > Please read the commit message of
> >
> > commit dd0d3a2c0fb388745519c8a3be800720541eccfe
> > Author: Matt Turner <mattst88 at gmail.com>
> > Date: Tue Mar 10 17:55:21 2015 -0700
> >
> > mesa: Replace _mesa_round_to_even() with _mesa_roundeven().
> >
> > for a lot of the background.
> >
> > I expect IROUND_POS can be replaced with the _mesa_lroundevenf function.
> >
>
> Yes, I missed that function. _mesa_lroundevenf is exactly what we want
> (although if someone cares about other archs, lrintf might generate
> pretty bad code there too).
>
> The commit message though isn't quite right there, saying "rint() and
> nearbyint() implement the round-half-to-even behavior we want when the
> rounding mode is set to the default round-to-nearest. The only
> difference between them is that nearbyint() raises the inexact
> exception."
> And seemingly uses that to justify using rintf and not nearbyintf for
> the fallback. It's the opposite, nearbyint never raises inexact
> exception, whereas rint does (depending on fp environment). Or maybe not
> (I suppose nearbyint could be more work because the mxcsr bits might
> have to be set explicitly to NOT cause an exception).
IIRC that's exactly right. nearbyint would do extra work to disable
the inexact exception and doesn't have an lnearbyint* equivalent.
Since we use -fno-math-errno -fno-trapping-math we get what we want
with lrint: a single instruction on amd64 without any floating-point
control word changes.
More information about the mesa-dev
mailing list